JonathanReeve/corpus-db

A textual corpus database for the digital humanities.

/ 100

Emerging

This project helps digital humanities researchers and literary scholars easily find and download specific collections of public domain texts. You input criteria like literary genre, author, publication decade, or setting, and it provides a curated subcorpus of books for your analysis. This is ideal for academics, students, and anyone doing literary research.

No commits in the last 6 months.

Use this if you need to quickly assemble a dataset of texts with particular characteristics for literary analysis or computational humanities projects.

Not ideal if you need to analyze a random sample of texts without specific metadata filters, or if you're looking for copyrighted materials.

digital-humanities literary-research corpus-linguistics text-analysis literary-scholarship

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 14 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

GPL-3.0

Higher-rated alternatives

Helsinki-NLP/OpusFilter

OpusFilter - Parallel corpus processing toolkit

natasha/corus

Links to Russian corpora + Python functions for loading and parsing

darija-open-dataset/dataset

darija <-> english dataset

omicsNLP/Auto-CORPus

Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...

SergeyShk/ruTS

Библиотека для извлечения статистик из текстов на русском языке.

Explore NLP Tools

All categories Trending NLP directory Insights