r-kovalch/omnigec-data
End‑to‑end pipelines, notebooks and configs for assembling the multilingual OmniGEC silver‑standard corpus (WikiEdits, Reddit, UberText, MultiGEC‑25) and preparing it for model training
No commits in the last 6 months.
Stars
3
Forks
—
Language
Jupyter Notebook
License
—
Category
Last pushed
May 28, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/r-kovalch/omnigec-data"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
PaddlePaddle/ERNIE
The official repository for ERNIE 4.5 and ERNIEKit – its industrial-grade development toolkit...
eyurtsev/kor
LLM(😽)
NiuTrans/NLPBook
A comprehensive book on neural networks and large language models in NLP
bigscience-workshop/data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
allenai/TOPICAL
:magic_wand::page_facing_up: TOPICAL: TOPIC pages AutomagicaLly