seoyeon9646/MLM-data-augmentation

Masked Language Modeling for data augmentation

/ 100

Experimental

This project helps developers augment their Korean text datasets by generating new sentences based on existing patterns. It takes raw Korean sentences (like hate speech or movie reviews) and outputs variations where certain words are replaced with contextually appropriate alternatives. Data scientists, machine learning engineers, and NLP researchers working with Korean language models would use this.

No commits in the last 6 months.

Use this if you need to quickly expand the size and diversity of your Korean text dataset for training natural language processing models, especially when dealing with unrefined or specific types of Korean text.

Not ideal if you require precise, human-curated augmented data or are working with languages other than Korean.

Korean NLP text data augmentation natural language processing machine learning data scientist

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

Tongjilibo/bert4torch

An elegent pytorch implement of transformers

nyu-mll/jiant

jiant is an nlp toolkit

lonePatient/TorchBlocks

A PyTorch-based toolkit for natural language processing

monologg/JointBERT

Pytorch implementation of JointBERT: "BERT for Joint Intent Classification and Slot Filling"

grammarly/gector

Official implementation of the papers "GECToR – Grammatical Error Correction: Tag, Not Rewrite"...

Explore Transformer Models

All categories Trending Transformer directory Insights