amazon-science/webie
Dataset for web-scaled information extraction.
This dataset helps researchers and developers working on natural language processing to evaluate and build models that can extract specific information from web pages. It takes raw web content from the C4 dataset and provides annotated examples, showing what information should be extracted and where it's located. It's intended for those who develop or refine information extraction systems.
No commits in the last 6 months.
Use this if you are developing or testing models for information extraction from unstructured web text and need a robust, pre-annotated dataset.
Not ideal if you are looking for a ready-to-use information extraction tool or do not work with NLP model training.
Stars
8
Forks
1
Language
Python
License
—
Category
Last pushed
Jul 26, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/amazon-science/webie"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
zjunlp/OpenUE
[EMNLP 2020] OpenUE: An Open Toolkit of Universal Extraction from Text
OpenSextant/Xponents
Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction...
BaptisteBlouin/EventExtractionPapers
A list of NLP resources focused on event extraction task
philipperemy/stanford-openie-python
Stanford Open Information Extraction made simple!
uma-pi1/minie
An open information extraction system that provides compact extractions