google/langextract
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive visualization.
This tool helps non-technical professionals like researchers or analysts to quickly pull specific, structured facts from large amounts of unstructured text, such as clinical notes, reports, or literary works. You provide raw text and define what information you're looking for (e.g., characters, medications, relationships), and it outputs an organized list of those extracted details, complete with their exact location in the original document and an interactive visualization. This is ideal for anyone needing to systematically find and verify specific data points across many documents without manual review.
34,668 stars. Actively maintained with 11 commits in the last 30 days. Available on PyPI.
Use this if you need to extract specific types of information from large volumes of text documents and want to ensure the extracted data is directly traceable back to its source.
Not ideal if your task requires summarizing or generating new text rather than strictly extracting existing facts, or if you don't need to verify extractions against their original context.
Stars
34,668
Forks
2,330
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 25, 2026
Commits (30d)
11
Dependencies
17
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/google/langextract"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Recent Releases
Related tools
Extralit/extralit
Fast and accurate systemic data extraction with LLM assistance
Keyvanhardani/german-ocr
German-OCR is specifically trained to extract text from German documents including invoices,...
oidlabs-com/Lexoid
Multimodal document parser for high quality data understanding and extraction
xingbow/SciDaEx
Structured data extraction from research literature
parsee-ai/parsee-core
Retrieval of fully structured data made easy. Use LLMs or custom models. Specialized on PDFs and...