several27/FakeNewsCorpus
A dataset of millions of news articles scraped from a curated list of data sources.
This dataset offers millions of news articles from various sources, categorized by type (like fake news, satire, or credible). It provides raw content, titles, authors, and other metadata, allowing you to feed this information into a system for automated analysis. Data scientists, researchers, or anyone building tools for content verification would use this.
413 stars. No commits in the last 6 months.
Use this if you need a large, pre-labeled corpus of news articles to train machine learning models for identifying different types of news, particularly for 'fake news' detection.
Not ideal if you need a constantly updated news dataset for real-time analysis, as this dataset is not planned for continuous updates.
Stars
413
Forks
98
Language
—
License
Apache-2.0
Category
Last pushed
Jan 25, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/several27/FakeNewsCorpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
openfactcheck-research/openfactcheck
An Open-source Factuality Evaluation Demo for LLMs
lilakk/BooookScore
A package to generate summaries of long-form text and evaluate the coherence of these summaries....
Cartus/Automated-Fact-Checking-Resources
Links to conference/journal publications in automated fact-checking (resources for the...
armingh2000/FactScoreLite
FactScoreLite is an implementation of the FactScore metric, designed for detailed accuracy...
manideep2510/siamese-BERT-fake-news-detection-LIAR
Triple Branch BERT Siamese Network for fake news classification on LIAR-PLUS dataset in PyTorch