huggingface/hf_benchmarks

A starter kit for evaluating benchmarks on the 🤗 Hub

/ 100

Emerging

This toolkit helps machine learning engineers and researchers evaluate how well different language models perform on specific natural language processing tasks. You provide your model's outputs for a given benchmark, and it processes them to generate standardized metrics and comparisons against other models. The primary users are those who train or fine-tune NLP models and need to rigorously assess their performance.

No commits in the last 6 months.

Use this if you need to submit your natural language processing model's results to a community benchmark and see how it ranks against others.

Not ideal if you are looking for a tool to train or fine-tune models, or if your tasks are outside of natural language processing.

natural-language-processing machine-learning-evaluation model-benchmarking language-model-comparison

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 9 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

opentensor/bittensor

Internet-scale Neural Networks

trailofbits/fickling

A Python pickling decompiler and static analyzer

benchopt/benchopt

A framework for reproducible, comparable benchmarks

BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by encouraging code...

mosaicml/streaming

A Data Streaming Library for Efficient Neural Network Training

Explore ML Frameworks

All categories Trending ML Framework directory Insights