jasonwei20/eda_nlp

Data augmentation for NLP, presented at EMNLP 2019

/ 100

Emerging

This tool helps improve the accuracy of text classification models, especially when you have a small dataset. It takes your existing labeled text data and generates new, subtly varied sentences, effectively expanding your training set. This is ideal for machine learning engineers, data scientists, or researchers who are building models to categorize text.

1,651 stars. No commits in the last 6 months.

Use this if you are working on a text classification project and your model's performance is limited by the amount of available training data.

Not ideal if you already have a very large text dataset or if you require highly specialized, domain-specific augmentation beyond simple word edits.

text-classification natural-language-processing machine-learning-training data-scarcity model-performance

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 8 / 25

Community 24 / 25

How are scores calculated?

Stars

1,651

Forks

313

Language

Python

License

—

Higher-rated alternatives

MinishLab/model2vec

Fast State-of-the-Art Static Embeddings

AnswerDotAI/ModernBERT

Bringing BERT into modernity via both architecture changes and scaling

tensorflow/hub

A library for transfer learning by reusing parts of TensorFlow models.

Embedding/Chinese-Word-Vectors

100+ Chinese Word Vectors 上百种预训练中文词向量

twang2218/vocab-coverage

语言模型中文认知能力分析

Explore Embedding Tools

All categories Trending Embeddings directory Insights