dbamman/book-nlp

Natural language processing pipeline for book-length documents (archival Java version; for current Python version, see: https://github.com/booknlp/booknlp)

/ 100

Emerging

This tool helps researchers in literary studies or digital humanities automatically analyze long English texts, like novels. It processes a plain text file, identifying characters, mapping aliases to a single character, and attributing dialogue. The output is a highly detailed, annotated version of the text and a JSON file with character features, useful for large-scale textual analysis.

316 stars. No commits in the last 6 months.

Use this if you need to deeply analyze literary texts, track characters, and understand narrative structure without manually reading and annotating every single book.

Not ideal if you're working with short documents, non-English texts, or primarily need a simple word count or topic modeling without deep character and discourse analysis.

literary-analysis digital-humanities textual-scholarship narrative-analysis character-studies

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 19 / 25

How are scores calculated?

Stars

316

Forks

Language

Java

License

—

Higher-rated alternatives

apache/opennlp

Apache OpenNLP

stanfordnlp/CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing,...

stanfordnlp/python-stanford-corenlp

Python interface to CoreNLP using a bidirectional server-client interface.

dkpro/dkpro-core

Collection of software components for natural language processing (NLP) based on the Apache UIMA...

apache/opennlp-sandbox

Apache OpenNLP Sandbox

Explore NLP Tools

All categories Trending NLP directory Insights