boudinfl/ake-datasets

Large, curated set of benchmark datasets for evaluating automatic keyphrase extraction algorithms.

46
/ 100
Emerging

This collection provides carefully assembled datasets for anyone who needs to test and compare different methods for automatically identifying important keywords or phrases in documents. It takes in full papers, abstracts, or news articles and provides structured files containing the original text along with 'gold standard' keyphrases, which are the correct answers. This is for researchers and practitioners working on text analysis, information retrieval, or content summarization.

147 stars. No commits in the last 6 months.

Use this if you are developing or evaluating algorithms that extract keyphrases from texts and need a standardized set of documents with known correct keyphrases to benchmark your system's performance.

Not ideal if you are looking for a tool to perform keyphrase extraction directly or need datasets in a format other than XML or JSON.

text-mining information-extraction natural-language-processing academic-research content-analysis
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

147

Forks

28

Language

Shell

License

Apache-2.0

Last pushed

Jul 03, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/boudinfl/ake-datasets"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.