DFKI-NLP/product-corpus
This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the relation CompanyProvidesProduct.
This corpus is a collection of 174 English web pages and social media posts, carefully marked up to identify specific company names, product names, and the relationships between them (i.e., which company provides which product). It helps in extracting non-standard, business-to-business products and their providing companies from unstructured text. This is designed for data scientists or researchers building tools to automatically understand company-product relationships.
No commits in the last 6 months.
Use this if you are developing or training a system that needs to accurately identify business products and the companies that offer them from general text.
Not ideal if you need a dataset for consumer products or for identifying general entities beyond companies and their products.
Stars
12
Forks
1
Language
—
License
CC-BY-4.0
Category
Last pushed
Sep 17, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/DFKI-NLP/product-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Helsinki-NLP/OpusFilter
OpusFilter - Parallel corpus processing toolkit
natasha/corus
Links to Russian corpora + Python functions for loading and parsing
SergeyShk/ruTS
Библиотека для извлечения статистик из текстов на русском языке.
darija-open-dataset/dataset
darija <-> english dataset
omicsNLP/Auto-CORPus
Auto-CORPus pipeline developed by a University of Nottingham and Imperial College London...