425776024/nlpcda

一键中文数据增强包； NLP数据增强、bert数据增强、EDA：pip install nlpcda

/ 100

Established

This tool helps people who work with Chinese text data to expand their datasets. You provide existing Chinese text, and it generates multiple variations of that text, carefully designed to retain the original meaning. This is useful for anyone training natural language processing (NLP) models, such as AI engineers or data scientists, who need more diverse training examples to improve model performance.

1,878 stars. Used by 1 other package. No commits in the last 6 months. Available on PyPI.

Use this if you need to create more diverse training data from your existing Chinese text corpus to make your NLP models more robust and performant.

Not ideal if your primary goal is to achieve marginal accuracy increases in competitive leaderboards, as this tool primarily focuses on enhancing model generalization rather than raw scoring.

NLP-model-training Chinese-text-processing AI-data-preparation text-analytics machine-learning-engineering

Stale 6m

Maintenance 0 / 25

Adoption 11 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

1,878

Forks

172

Language

Python

License

Apache-2.0

Compare

nlpcda and EDA_NLP_for_Chinese nlpcda and nlp-data-augmentation

Related tools

dsfsi/textaugment

TextAugment: Text Augmentation Library

searchableai/KitanaQA

KitanaQA: Adversarial training and data augmentation for neural question-answering models

SanghunYun/UDA_pytorch

UDA(Unsupervised Data Augmentation) implemented by pytorch

google-research/uda

Unsupervised Data Augmentation (UDA)

KennethEnevoldsen/augmenty

Augmenty is an augmentation library based on spaCy for augmenting texts.

Explore NLP Tools

All categories Trending NLP directory Insights