ieg-dhr/NLP-Course4Humanities_2024
This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification
This project helps humanities scholars and cultural studies researchers analyze large collections of historical newspaper texts. It takes raw historical newspaper data, often with OCR errors, and applies natural language processing techniques to extract insights. Researchers can identify key themes, recognize entities like people and places, and semantically search through articles.
No commits in the last 6 months.
Use this if you are a humanities or cultural studies researcher looking to apply computational methods to large historical text datasets, especially digitized newspapers.
Not ideal if you are a developer looking for an NLP library or a practitioner outside of humanities and cultural studies.
Stars
19
Forks
6
Language
Jupyter Notebook
License
—
Category
Last pushed
Jun 05, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ieg-dhr/NLP-Course4Humanities_2024"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
natasha/natasha
Solves basic Russian NLP tasks, API for lower level Natasha projects
monikkinom/ner-lstm
Named Entity Recognition using multilayered bidirectional LSTM
ancatmara/data-science-nlp
NLP Section of the Data Science course, NRU HSE
mhbashari/awesome-persian-nlp-ir
Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
soheil-mp/Natural-Language-Processing-Tutorials
NLP Webinars Created for Udacity's Mentorship Program (2019).