jasonwei20/eda_nlp
Data augmentation for NLP, presented at EMNLP 2019
This tool helps improve the accuracy of text classification models, especially when you have a small dataset. It takes your existing labeled text data and generates new, subtly varied sentences, effectively expanding your training set. This is ideal for machine learning engineers, data scientists, or researchers who are building models to categorize text.
1,651 stars. No commits in the last 6 months.
Use this if you are working on a text classification project and your model's performance is limited by the amount of available training data.
Not ideal if you already have a very large text dataset or if you require highly specialized, domain-specific augmentation beyond simple word edits.
Stars
1,651
Forks
313
Language
Python
License
—
Category
Last pushed
Mar 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/jasonwei20/eda_nlp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MinishLab/model2vec
Fast State-of-the-Art Static Embeddings
AnswerDotAI/ModernBERT
Bringing BERT into modernity via both architecture changes and scaling
tensorflow/hub
A library for transfer learning by reusing parts of TensorFlow models.
Embedding/Chinese-Word-Vectors
100+ Chinese Word Vectors 上百种预训练中文词向量
twang2218/vocab-coverage
语言模型中文认知能力分析