pradeepdev-1995/databalancer

Databalancer is the python library using in machine learning applications to balance the imbalanced text classification datasets before the model training.

/ 100

Experimental

This library helps machine learning practitioners prepare text data for classification models. It takes an imbalanced text dataset (e.g., a CSV file with text and categories) and generates new, synthetic text examples for under-represented categories. The output is a new, balanced dataset ready for model training, helping to improve model performance on all categories.

No commits in the last 6 months. Available on PyPI.

Use this if you are a machine learning engineer or data scientist working with text classification and your dataset has significantly fewer examples for some categories than others, leading to poor model performance on those rare categories.

Not ideal if your dataset is already well-balanced, or if you are not working with text classification problems.

text-classification dataset-balancing natural-language-processing machine-learning-engineering

Stale 6m No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 25 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

fidelity/textwiser

[AAAI 2021] TextWiser: Text Featurization Library

RandolphVI/Multi-Label-Text-Classification

About Muti-Label Text Classification Based on Neural Network.

ThilinaRajapakse/pytorch-transformers-classification

Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for...

ntumlgroup/LibMultiLabel

A library for multi-class and multi-label classification

xuyige/BERT4doc-Classification

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Explore NLP Tools

All categories Trending NLP directory Insights