fajri91/sum_liputan6
The first large-scale summarization corpus for the Indonesian language. AACL 2020.
This project provides a large collection of Indonesian news articles from 2000-2010 paired with their summaries. It's designed to help researchers develop and test automated tools that can condense long articles into shorter, coherent summaries. The input is a massive dataset of Indonesian news, and the output is a resource that can train and evaluate summarization models. Researchers working on natural language processing, especially those focused on the Indonesian language, would use this.
No commits in the last 6 months.
Use this if you are an NLP researcher or academic needing a substantial, pre-processed dataset of Indonesian news articles and their summaries for developing or evaluating text summarization algorithms.
Not ideal if you need a tool for commercial purposes, as this corpus is strictly for non-commercial academic research, or if you are looking for summarization software rather than a dataset to build one.
Stars
38
Forks
9
Language
Python
License
—
Category
Last pushed
Mar 04, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/fajri91/sum_liputan6"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
abelriboulot/onnxt5
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using...
pszemraj/textsum
CLI & Python API to easily summarize text-based files with transformers
rojagtap/transformer-abstractive-summarization
Abstractive Text Summarization using Transformer
HHousen/DocSum
A tool to automatically summarize documents abstractively using the BART or PreSumm Machine...
abhilash1910/LongPegasus
LongPegasus package is used for inducing longformer self attention over base pegasus abstractive...