grobidOrg/grobid

A machine learning software for extracting information from scholarly documents

/ 100

Established

This tool helps researchers, librarians, and data scientists automatically extract detailed information from scholarly PDFs. You feed it research papers in PDF format, and it outputs structured data like titles, authors, abstracts, references, and even full-text sections, ready for analysis or database entry. This is ideal for anyone needing to process large volumes of academic literature to organize, search, or build knowledge bases.

4,703 stars. Actively maintained with 27 commits in the last 30 days.

Use this if you need to systematically pull out specific pieces of information, such as bibliographic data or structured full text, from a collection of scientific or technical PDF documents.

Not ideal if you only need a simple full-text extraction without detailed parsing or if your documents are not academic papers.

academic-publishing research-data-management library-science scientific-information-extraction bibliographic-analysis

No Package No Dependents

Maintenance 20 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 21 / 25

How are scores calculated?

Stars

4,703

Forks

538

Language

Java

License

Apache-2.0

Related tools

lihanghang/NLP-Knowledge-Graph

自然语言处理、知识图谱、对话系统，大模型等技术研究与应用。

obss/jury

Comprehensive NLP Evaluation System

yzhangcs/parser

:rocket: State-of-the-art parsers for natural language.

alibaba/EasyNLP

EasyNLP: A Comprehensive and Easy-to-use NLP Toolkit

polakowo/textai

Applications using state-of-the-art in NLP

Explore NLP Tools

All categories Trending NLP directory Insights