ML Benchmarking Frameworks

Tools and frameworks for reproducibly benchmarking, evaluating, and comparing machine learning models across different domains and datasets. Does NOT include domain-specific prediction tasks, competition leaderboards, or educational coursework collections.

There are 40 ml benchmarking frameworks tracked. 2 score above 70 (verified tier). The highest-rated is opentensor/bittensor at 80/100 with 1,383 stars. 2 of the top 10 are actively maintained.

Get all 40 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-benchmarking-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

#	Framework	Score	Tier	Stars	Language
1	opentensor/bittensor Internet-scale Neural Networks	80	Verified	1,383	Python
2	trailofbits/fickling A Python pickling decompiler and static analyzer	72	Verified	609	Python
3	benchopt/benchopt A framework for reproducible, comparable benchmarks	68	Established	294	Python
4	BiomedSciAI/fuse-med-ml A python framework accelerating ML based discovery in the medical field by...	66	Established	154	Python
5	mosaicml/streaming A Data Streaming Library for Efficient Neural Network Training	57	Established	1,472	Python
6	taoshidev/vanta-network Vanta Network built on Bittensor	55	Established	71	Python
7	breuner/elbencho A distributed storage benchmark for file systems, object stores & block...	53	Established	256	C++
8	google-research/zapbench The Zebrafish Activity Prediction Benchmark measures progress on the problem...	52	Established	67	Python
9	tensorflow/model-card-toolkit A toolkit that streamlines and automates the generation of model cards	50	Established	444	Python
10	SDNNetSim/FUSION FUSION is an open-source project aimed at revolutionizing networking through...	48	Emerging	13	Python
11	mariusbrataas/flowpoints_ml An intuitive approach to creating deep learning models	46	Emerging	372	JavaScript
12	heilcheng/openevals Benchmarking suite for open-weight language models	45	Emerging	133	Python
13	aai-institute/nnbench A small framework for benchmarking machine learning models.	44	Emerging	21	Python
14	KevinMusgrave/powerful-benchmarker A library for ML benchmarking. It's powerful.	43	Emerging	439	Jupyter Notebook
15	google-research/rliable [NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML...	41	Emerging	866	Jupyter Notebook
16	scott-huberty/amica-python Python Implementation of Adaptive Mixture ICA	41	Emerging	1	Python
17	SafeRL-Lab/BenchNetRL 🔥Benchmarking of Neural Network Architectures in Reinforcement Learning.	41	Emerging	34	Python
18	HanBnrd/BenchNIRS Benchmarking framework for machine learning with fNIRS	41	Emerging	6	Python
19	florencejt/fusilli A Python package housing a collection of deep-learning multi-modal data...	40	Emerging	198	Python
20	rllm-team/tlsql Table Learning Structured Query Language	39	Emerging	5	Python
21	modelflows/ModelFLOWs-app ModelFLOWs application	39	Emerging	22	Python
22	data-centric-ai/dcbench A benchmark of data-centric tasks from across the machine learning lifecycle.	38	Emerging	71	Jupyter Notebook
23	CryAndRRich/dataflow Decoding customer behaviors via Hybrid Neural-ML frameworks (3rd place of...	38	Emerging	1	Python
24	DACUS1995/pytorch-mmap-dataset A custom pytorch Dataset extension that provides a faster iteration and...	38	Emerging	46	Python
25	opentensor/validators Repository for bittensor validators	38	Emerging	16	Python
26	IvanIZ/BenchPush BenchPush is a comprehensive benchmarking suite designed for mobile robots...	37	Emerging	18	Python
27	tcbenchstack/tcbench tcbench is a Machine Learning and Deep Learning framework to train model...	36	Emerging	32	Jupyter Notebook
28	Jahid-Hasan1/Py-Fusion 🐍PyFusion🐍 is an open-source Python project designed to seamlessly integrate...	34	Emerging	8	Python
29	neuroprismlab/PRISME-Brain-Power-Calculator PRISME Power Calculator	33	Emerging	4	MATLAB
30	kolesole/PredQL PredQL is a Python framework for task generation in Relational Deep...	32	Emerging	5	Python
31	huggingface/hf_benchmarks A starter kit for evaluating benchmarks on the 🤗 Hub	31	Emerging	16	Python
32	TorchQL/torchql TorchQL is a query language for Python-based machine learning models and datasets.	30	Emerging	10	Python
33	nprint/benchmarks A central repository to track the progress of network traffic analysis	29	Experimental	7	SCSS
34	Kushalk0677/Inference-Energy-and-Latency-in-AI-Mediated-Education-Green-Audit Empirical study of inference energy, latency, and pedagogical quality for...	26	Experimental	2	Python
35	helkaroui/RapidFlow RapidFlow is a straightforward tool for bringing machine learning models...	22	Experimental	6	JavaScript
36	yuliu625/Yu-Deep-Learning-Toolkit A versatile deep learning toolkit providing reusable components for common...	21	Experimental	—	Python
37	lkopf/prism [NeurIPS 2025] PRISM is a multi-concept feature description framework which...	21	Experimental	8	Jupyter Notebook
38	Sahilrajveer/reasonbench 📊 Evaluate machine learning models with realistic benchmarks that offer a...	14	Experimental	—	—
39	katha-ai/VELOCITI VELOCITI Benchmark Evaluation and Visualisation Code	14	Experimental	8	Python
40	modelbench/modelbench A tool for reproducibly benchmarking machine learning models.	11	Experimental	4	JavaScript

Comparisons in this category

bittensor and vanta-network (80 vs 55)