dpasse/extr-ds

Library to programmatically build labeled datasets for Named-Entity Recognition (NER) and Relation Extraction (RE) Machine Learning tasks

/ 100

Emerging

This tool helps data scientists and ML engineers create high-quality, labeled text datasets for training custom AI models. You provide raw text and a set of rules, and it automatically generates structured labels identifying specific entities (like names or places) and the relationships between them. This helps you efficiently prepare data for tasks like automatically extracting information from documents.

No commits in the last 6 months. Available on PyPI.

Use this if you need to programmatically build large, labeled text datasets for training AI models to recognize entities or relationships within text.

Not ideal if you prefer manual annotation for small datasets or if you're not comfortable defining labeling rules programmatically.

natural-language-processing data-labeling information-extraction machine-learning-engineering

Stale 6m

Maintenance 0 / 25

Adoption 4 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

davidsbatista/BREDS

"Bootstrapping Relationship Extractors with Distributional Semantics" (Batista et al., 2015) in...

davidsbatista/Snowball

Implementation with some extensions of the paper "Snowball: Extracting Relations from Large...

nicolay-r/AREkit

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing...

plkmo/BERT-Relation-Extraction

PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper

thunlp/FewRel

A Large-Scale Few-Shot Relation Extraction Dataset

Explore NLP Tools

All categories Trending NLP directory Insights