boudinfl/ake-datasets
Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.
This collection provides carefully assembled datasets for anyone who needs to test and compare different methods for automatically identifying important keywords or phrases in documents. It takes in full papers, abstracts, or news articles and provides structured files containing the original text along with 'gold standard' keyphrases, which are the correct answers. This is for researchers and practitioners working on text analysis, information retrieval, or content summarization.
147 stars. No commits in the last 6 months.
Use this if you are developing or evaluating algorithms that extract keyphrases from texts and need a standardized set of documents with known correct keyphrases to benchmark your system's performance.
Not ideal if you are looking for a tool to perform keyphrase extraction directly or need datasets in a format other than XML or JSON.
Stars
147
Forks
28
Language
Shell
License
Apache-2.0
Category
Last pushed
Jul 03, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/boudinfl/ake-datasets"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
vi3k6i5/flashtext
Extract Keywords from sentence or Replace keywords in sentences.
alirezatheh/perke
A keyphrase extractor for Persian
andrewtavis/kwx
BERT, LDA, and TFIDF based keyword extraction in Python
cbaziotis/ekphrasis
Ekphrasis is a text processing tool, geared towards text from social networks, such as Twitter...
lovit/KR-WordRank
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다