dbamman/book-nlp
Natural language processing pipeline for book-length documents (archival Java version; for current Python version, see: https://github.com/booknlp/booknlp)
This tool helps researchers in literary studies or digital humanities automatically analyze long English texts, like novels. It processes a plain text file, identifying characters, mapping aliases to a single character, and attributing dialogue. The output is a highly detailed, annotated version of the text and a JSON file with character features, useful for large-scale textual analysis.
316 stars. No commits in the last 6 months.
Use this if you need to deeply analyze literary texts, track characters, and understand narrative structure without manually reading and annotating every single book.
Not ideal if you're working with short documents, non-English texts, or primarily need a simple word count or topic modeling without deep character and discourse analysis.
Stars
316
Forks
46
Language
Java
License
—
Category
Last pushed
Feb 04, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/dbamman/book-nlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
apache/opennlp
Apache OpenNLP
stanfordnlp/CoreNLP
CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing,...
stanfordnlp/python-stanford-corenlp
Python interface to CoreNLP using a bidirectional server-client interface.
dkpro/dkpro-core
Collection of software components for natural language processing (NLP) based on the Apache UIMA...
apache/opennlp-sandbox
Apache OpenNLP Sandbox