EmilHvitfeldt/R-text-data
List of textual data sources to be used for text mining in R
This is a curated list of textual data sources, primarily for those working with R, to practice text analysis and natural language processing. It provides readily available datasets ranging from classic literature to religious texts and TV show scripts, formatted for easy use. Text analysts, researchers, and students can use this to quickly obtain diverse text data for their projects without extensive data wrangling.
150 stars. No commits in the last 6 months.
Use this if you need various types of pre-processed text data to jumpstart your text mining or NLP project in R.
Not ideal if you're looking for a guide on how to perform text analysis or if you need to build custom datasets from raw web sources.
Stars
150
Forks
15
Language
—
License
—
Category
Last pushed
Aug 17, 2021
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/EmilHvitfeldt/R-text-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
quanteda/quanteda
An R package for the Quantitative Analysis of Textual Data
juliasilge/tidytext
Text mining using tidy tools :sparkles::page_facing_up::sparkles:
massimoaria/tall
Text Analysis for aLL
keyATM/keyATM
An R package for Keyword Assisted Topic Models
gagolews/stringi
Fast and Portable Character String Processing in R (with the Unicode ICU)