microsoft/CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
This tool helps researchers and language experts generate synthetic code-mixed text for languages where data is scarce. You provide parallel sentences in two languages, and it outputs grammatically valid, artificial code-mixed sentences. This is ideal for linguists or NLP researchers needing data to train or evaluate language models.
No commits in the last 6 months.
Use this if you need to create large amounts of artificial, grammatically correct code-mixed text from existing parallel translations to address data scarcity for multilingual language processing.
Not ideal if you're looking for a simple, off-the-shelf solution for casual code-mixing or if you're not comfortable with some technical setup.
Stars
58
Forks
13
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Jul 30, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/microsoft/CodeMixed-Text-Generator"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
batzner/tensorlm
Wrapper library for text generation / language models at character and word level with RNNs in TensorFlow
EagleW/PaperRobot
Code for PaperRobot: Incremental Draft Generation of Scientific Ideas
Cyrilvallez/TextWiz
An even simpler way to generate text with LLMs.
yingpengma/Awesome-Story-Generation
This repository collects an extensive list of awesome papers about Story Generation /...
tallpauley/wordsiv
A Python library for generating text with a limited character set, with type proofing in mind