kajyuuen/daaja
This repository has implementations of data augmentation for NLP for Japanese.
This tool helps Japanese NLP practitioners expand their limited datasets for tasks like text classification and named entity recognition. It takes Japanese sentences or sequences of words with their labels as input and generates variations by swapping, inserting, deleting, or replacing words with synonyms. This is for data scientists or machine learning engineers working with Japanese text who need more data to train robust models.
No commits in the last 6 months.
Use this if you are building machine learning models for Japanese text and need to artificially increase the size and diversity of your training data.
Not ideal if you are working with languages other than Japanese, or if you already have a very large and diverse dataset.
Stars
64
Forks
5
Language
Python
License
—
Category
Last pushed
Feb 16, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/kajyuuen/daaja"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dsfsi/textaugment
TextAugment: Text Augmentation Library
425776024/nlpcda
一键中文数据增强包 ; NLP数据增强、bert数据增强、EDA:pip install nlpcda
google-research/uda
Unsupervised Data Augmentation (UDA)
searchableai/KitanaQA
KitanaQA: Adversarial training and data augmentation for neural question-answering models
SanghunYun/UDA_pytorch
UDA(Unsupervised Data Augmentation) implemented by pytorch