amazon-science/webie

Dataset for web-scaled information extraction.

/ 100

Experimental

This dataset helps researchers and developers working on natural language processing to evaluate and build models that can extract specific information from web pages. It takes raw web content from the C4 dataset and provides annotated examples, showing what information should be extracted and where it's located. It's intended for those who develop or refine information extraction systems.

No commits in the last 6 months.

Use this if you are developing or testing models for information extraction from unstructured web text and need a robust, pre-annotated dataset.

Not ideal if you are looking for a ready-to-use information extraction tool or do not work with NLP model training.

natural-language-processing information-extraction machine-learning-datasets web-content-analysis model-evaluation

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

zjunlp/OpenUE

[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text

OpenSextant/Xponents

Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction...

BaptisteBlouin/EventExtractionPapers

A list of NLP resources focused on event extraction task

philipperemy/stanford-openie-python

Stanford Open Information Extraction made simple!

uma-pi1/minie

An open information extraction system that provides compact extractions

Explore NLP Tools

All categories Trending NLP directory Insights