CeON/CERMINE
Content ExtRactor and MINEr
This tool helps researchers, librarians, and data scientists automatically extract key information from PDF academic publications. You input PDF files, individual reference strings, or affiliation strings, and it outputs structured metadata, full text, parsed references, and other content in formats like NLM JATS or BibTeX. It's designed for anyone needing to efficiently process and analyze large collections of scientific literature.
513 stars. No commits in the last 6 months.
Use this if you need to programmatically or batch-process PDF scientific papers to extract their metadata, full text, or bibliographic references for analysis or ingestion into databases.
Not ideal if you only need to process a few files manually or prefer a graphical user interface over command-line tools or a web service.
Stars
513
Forks
99
Language
Java
License
AGPL-3.0
Category
Last pushed
Jun 30, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/CeON/CERMINE"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
rosette-api/java
Babel Street Analytics Client Library for Java
kermitt2/entity-fishing
A machine learning tool for fishing entities
vinhkhuc/JFastText
Java interface for fastText
vinhkhuc/jcrfsuite
Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/
TechPrimers/core-nlp-example
Natural Language Processing Example using Stanford's Core NLP Java Library