seoyeon9646/MLM-data-augmentation

Masked Language Modeling for data augmentation

13
/ 100
Experimental

This project helps developers augment their Korean text datasets by generating new sentences based on existing patterns. It takes raw Korean sentences (like hate speech or movie reviews) and outputs variations where certain words are replaced with contextually appropriate alternatives. Data scientists, machine learning engineers, and NLP researchers working with Korean language models would use this.

No commits in the last 6 months.

Use this if you need to quickly expand the size and diversity of your Korean text dataset for training natural language processing models, especially when dealing with unrefined or specific types of Korean text.

Not ideal if you require precise, human-curated augmented data or are working with languages other than Korean.

Korean NLP text data augmentation natural language processing machine learning data scientist
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

9

Forks

Language

Python

License

Last pushed

Sep 22, 2022

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/seoyeon9646/MLM-data-augmentation"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.