ieg-dhr/NLP-Course4Humanities_2024

This repository is part of an NLP course for humanities and cultural studies. This course uses historical newspapers as a source and applies NLP methods to them. NLP tasks: Tokenization, Lemmatization, TF-IDF, Part-of-speech tagging, semantic search with transformers, article extraction and OCR post-correction with LLMs, NER and text classification

32
/ 100
Emerging

This project helps humanities scholars and cultural studies researchers analyze large collections of historical newspaper texts. It takes raw historical newspaper data, often with OCR errors, and applies natural language processing techniques to extract insights. Researchers can identify key themes, recognize entities like people and places, and semantically search through articles.

No commits in the last 6 months.

Use this if you are a humanities or cultural studies researcher looking to apply computational methods to large historical text datasets, especially digitized newspapers.

Not ideal if you are a developer looking for an NLP library or a practitioner outside of humanities and cultural studies.

digital-humanities cultural-studies historical-research text-analysis newspaper-archives
No License Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 6 / 25
Maturity 8 / 25
Community 16 / 25

How are scores calculated?

Stars

19

Forks

6

Language

Jupyter Notebook

License

Last pushed

Jun 05, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ieg-dhr/NLP-Course4Humanities_2024"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.