CeON/CERMINE

Content ExtRactor and MINEr

/ 100

Established

This tool helps researchers, librarians, and data scientists automatically extract key information from PDF academic publications. You input PDF files, individual reference strings, or affiliation strings, and it outputs structured metadata, full text, parsed references, and other content in formats like NLM JATS or BibTeX. It's designed for anyone needing to efficiently process and analyze large collections of scientific literature.

513 stars. No commits in the last 6 months.

Use this if you need to programmatically or batch-process PDF scientific papers to extract their metadata, full text, or bibliographic references for analysis or ingestion into databases.

Not ideal if you only need to process a few files manually or prefer a graphical user interface over command-line tools or a web service.

academic-publishing literature-review bibliometrics research-data-management scientific-text-mining

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 24 / 25

How are scores calculated?

Stars

513

Forks

Language

Java

License

AGPL-3.0

Related tools

rosette-api/java

Babel Street Analytics Client Library for Java

kermitt2/entity-fishing

A machine learning tool for fishing entities

vinhkhuc/JFastText

Java interface for fastText

vinhkhuc/jcrfsuite

Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/

TechPrimers/core-nlp-example

Natural Language Processing Example using Stanford's Core NLP Java Library

Explore NLP Tools

All categories Trending NLP directory Insights