microsoft/CodeMixed-Text-Generator

This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.

42
/ 100
Emerging

This tool helps researchers and language experts generate synthetic code-mixed text for languages where data is scarce. You provide parallel sentences in two languages, and it outputs grammatically valid, artificial code-mixed sentences. This is ideal for linguists or NLP researchers needing data to train or evaluate language models.

No commits in the last 6 months.

Use this if you need to create large amounts of artificial, grammatically correct code-mixed text from existing parallel translations to address data scarcity for multilingual language processing.

Not ideal if you're looking for a simple, off-the-shelf solution for casual code-mixing or if you're not comfortable with some technical setup.

natural-language-processing computational-linguistics multilingual-data language-resource-creation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

58

Forks

13

Language

Jupyter Notebook

License

MIT

Last pushed

Jul 30, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/CodeMixed-Text-Generator"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.