pradeepdev-1995/databalancer
Databalancer is the python library using in machine learning applications to balance the imbalanced text classification datasets before the model training.
This library helps machine learning practitioners prepare text data for classification models. It takes an imbalanced text dataset (e.g., a CSV file with text and categories) and generates new, synthetic text examples for under-represented categories. The output is a new, balanced dataset ready for model training, helping to improve model performance on all categories.
No commits in the last 6 months. Available on PyPI.
Use this if you are a machine learning engineer or data scientist working with text classification and your dataset has significantly fewer examples for some categories than others, leading to poor model performance on those rare categories.
Not ideal if your dataset is already well-balanced, or if you are not working with text classification problems.
Stars
7
Forks
—
Language
Python
License
Apache-2.0
Category
Last pushed
Jul 08, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/pradeepdev-1995/databalancer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
fidelity/textwiser
[AAAI 2021] TextWiser: Text Featurization Library
RandolphVI/Multi-Label-Text-Classification
About Muti-Label Text Classification Based on Neural Network.
ThilinaRajapakse/pytorch-transformers-classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for...
ntumlgroup/LibMultiLabel
A library for multi-class and multi-label classification
xuyige/BERT4doc-Classification
Code and source for paper ``How to Fine-Tune BERT for Text Classification?``