amazon-science/webie

Dataset for web-scaled information extraction.

28
/ 100
Experimental

This dataset helps researchers and developers working on natural language processing to evaluate and build models that can extract specific information from web pages. It takes raw web content from the C4 dataset and provides annotated examples, showing what information should be extracted and where it's located. It's intended for those who develop or refine information extraction systems.

No commits in the last 6 months.

Use this if you are developing or testing models for information extraction from unstructured web text and need a robust, pre-annotated dataset.

Not ideal if you are looking for a ready-to-use information extraction tool or do not work with NLP model training.

natural-language-processing information-extraction machine-learning-datasets web-content-analysis model-evaluation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 4 / 25
Maturity 16 / 25
Community 8 / 25

How are scores calculated?

Stars

8

Forks

1

Language

Python

License

Last pushed

Jul 26, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/amazon-science/webie"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.