fajri91/sum_liputan6

The first large-scale summarization corpus for the Indonesian language. AACL 2020.

32
/ 100
Emerging

This project provides a large collection of Indonesian news articles from 2000-2010 paired with their summaries. It's designed to help researchers develop and test automated tools that can condense long articles into shorter, coherent summaries. The input is a massive dataset of Indonesian news, and the output is a resource that can train and evaluate summarization models. Researchers working on natural language processing, especially those focused on the Indonesian language, would use this.

No commits in the last 6 months.

Use this if you are an NLP researcher or academic needing a substantial, pre-processed dataset of Indonesian news articles and their summaries for developing or evaluating text summarization algorithms.

Not ideal if you need a tool for commercial purposes, as this corpus is strictly for non-commercial academic research, or if you are looking for summarization software rather than a dataset to build one.

natural-language-processing computational-linguistics text-summarization academic-research indonesian-language
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

38

Forks

9

Language

Python

License

Last pushed

Mar 04, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fajri91/sum_liputan6"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.