uridr/GTWiki

Dataset for the paper: "A multi-task semi-supervised framework for Text2Graph & Graph2Text"

23
/ 100
Experimental

This dataset provides a collection of English text snippets and corresponding knowledge graphs derived from Wikipedia and Wikidata. It's designed for researchers and machine learning engineers working on systems that convert natural language into structured knowledge representations (like graphs) or generate natural language from existing graphs. You input raw text or knowledge graph triples, and the project aids in training models to produce the alternative format.

No commits in the last 6 months.

Use this if you need a non-parallel dataset to train or evaluate models for converting text into structured graphs or vice versa, especially in an unsupervised or semi-supervised setting.

Not ideal if you require a parallel dataset where each text snippet has a directly corresponding, manually annotated graph, or if your domain is not covered by Wikipedia/Wikidata.

knowledge-representation natural-language-processing information-extraction data-synthesis unsupervised-learning
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

25

Forks

Language

Python

License

MIT

Last pushed

Feb 19, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/uridr/GTWiki"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.