seoyeon9646/MLM-data-augmentation
Masked Language Modeling for data augmentation
This project helps developers augment their Korean text datasets by generating new sentences based on existing patterns. It takes raw Korean sentences (like hate speech or movie reviews) and outputs variations where certain words are replaced with contextually appropriate alternatives. Data scientists, machine learning engineers, and NLP researchers working with Korean language models would use this.
No commits in the last 6 months.
Use this if you need to quickly expand the size and diversity of your Korean text dataset for training natural language processing models, especially when dealing with unrefined or specific types of Korean text.
Not ideal if you require precise, human-curated augmented data or are working with languages other than Korean.
Stars
9
Forks
—
Language
Python
License
—
Category
Last pushed
Sep 22, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/seoyeon9646/MLM-data-augmentation"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
Tongjilibo/bert4torch
An elegent pytorch implement of transformers
nyu-mll/jiant
jiant is an nlp toolkit
lonePatient/TorchBlocks
A PyTorch-based toolkit for natural language processing
monologg/JointBERT
Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"
grammarly/gector
Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite"...