davidschulte/hf-dataset-selector

Find the best datasets for intermediate fine-tuning

/ 100

Emerging

When you're building a language model for a specific text task but lack enough training data, this tool helps you find additional, relevant datasets. You provide your target dataset and a base language model, and it outputs a ranked list of publicly available datasets from Hugging Face that are most likely to improve your model's performance through an intermediate fine-tuning step. This is for machine learning engineers or researchers working on natural language processing.

No commits in the last 6 months.

Use this if you have a specific text classification or generation task and need to find additional, related datasets to boost your language model's performance due to limited proprietary training data.

Not ideal if you already have ample training data for your specific task, or if your primary goal is to train a language model from scratch without leveraging existing pre-trained models or external datasets.

Natural Language Processing Machine Learning Model Training Text Classification Dataset Curation

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

Apache-2.0

Higher-rated alternatives

coetaur0/ESIM

Implementation of the ESIM model for natural language inference with PyTorch

erickrf/multiffn-nli

Implementation of the multi feed-forward network architecture by Parikh et al. (2016) for...

vanzytay/EMNLP2018_NLI

Repository for NLI models (EMNLP 2018)

hsinyuan-huang/FusionNet-NLI

An example for applying FusionNet to Natural Language Inference

sdnr1/EBIM-NLI

Enhanced BiLSTM Inference Model for Natural Language Inference

Explore NLP Tools

All categories Trending NLP directory Insights