ML Benchmarking Frameworks

Tools and frameworks for reproducibly benchmarking, evaluating, and comparing machine learning models across different domains and datasets. Does NOT include domain-specific prediction tasks, competition leaderboards, or educational coursework collections.

There are 40 ml benchmarking frameworks tracked. 2 score above 70 (verified tier). The highest-rated is opentensor/bittensor at 80/100 with 1,383 stars. 2 of the top 10 are actively maintained.

Get all 40 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=ml-benchmarking-frameworks&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 opentensor/bittensor

Internet-scale Neural Networks

80
Verified
2 trailofbits/fickling

A Python pickling decompiler and static analyzer

72
Verified
3 benchopt/benchopt

A framework for reproducible, comparable benchmarks

68
Established
4 BiomedSciAI/fuse-med-ml

A python framework accelerating ML based discovery in the medical field by...

66
Established
5 mosaicml/streaming

A Data Streaming Library for Efficient Neural Network Training

57
Established
6 taoshidev/vanta-network

Vanta Network built on Bittensor

55
Established
7 breuner/elbencho

A distributed storage benchmark for file systems, object stores & block...

53
Established
8 google-research/zapbench

The Zebrafish Activity Prediction Benchmark measures progress on the problem...

52
Established
9 tensorflow/model-card-toolkit

A toolkit that streamlines and automates the generation of model cards

50
Established
10 SDNNetSim/FUSION

FUSION is an open-source project aimed at revolutionizing networking through...

48
Emerging
11 mariusbrataas/flowpoints_ml

An intuitive approach to creating deep learning models

46
Emerging
12 heilcheng/openevals

Benchmarking suite for open-weight language models

45
Emerging
13 aai-institute/nnbench

A small framework for benchmarking machine learning models.

44
Emerging
14 KevinMusgrave/powerful-benchmarker

A library for ML benchmarking. It's powerful.

43
Emerging
15 google-research/rliable

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML...

41
Emerging
16 scott-huberty/amica-python

Python Implementation of Adaptive Mixture ICA

41
Emerging
17 SafeRL-Lab/BenchNetRL

🔥Benchmarking of Neural Network Architectures in Reinforcement Learning.

41
Emerging
18 HanBnrd/BenchNIRS

Benchmarking framework for machine learning with fNIRS

41
Emerging
19 florencejt/fusilli

A Python package housing a collection of deep-learning multi-modal data...

40
Emerging
20 rllm-team/tlsql

Table Learning Structured Query Language

39
Emerging
21 modelflows/ModelFLOWs-app

ModelFLOWs application

39
Emerging
22 data-centric-ai/dcbench

A benchmark of data-centric tasks from across the machine learning lifecycle.

38
Emerging
23 CryAndRRich/dataflow

Decoding customer behaviors via Hybrid Neural-ML frameworks (3rd place of...

38
Emerging
24 DACUS1995/pytorch-mmap-dataset

A custom pytorch Dataset extension that provides a faster iteration and...

38
Emerging
25 opentensor/validators

Repository for bittensor validators

38
Emerging
26 IvanIZ/BenchPush

BenchPush is a comprehensive benchmarking suite designed for mobile robots...

37
Emerging
27 tcbenchstack/tcbench

tcbench is a Machine Learning and Deep Learning framework to train model...

36
Emerging
28 Jahid-Hasan1/Py-Fusion

🐍PyFusion🐍 is an open-source Python project designed to seamlessly integrate...

34
Emerging
29 neuroprismlab/PRISME-Brain-Power-Calculator

PRISME Power Calculator

33
Emerging
30 kolesole/PredQL

PredQL is a Python framework for task generation in Relational Deep...

32
Emerging
31 huggingface/hf_benchmarks

A starter kit for evaluating benchmarks on the 🤗 Hub

31
Emerging
32 TorchQL/torchql

TorchQL is a query language for Python-based machine learning models and datasets.

30
Emerging
33 nprint/benchmarks

A central repository to track the progress of network traffic analysis

29
Experimental
34 Kushalk0677/Inference-Energy-and-Latency-in-AI-Mediated-Education-Green-Audit

Empirical study of inference energy, latency, and pedagogical quality for...

26
Experimental
35 helkaroui/RapidFlow

RapidFlow is a straightforward tool for bringing machine learning models...

22
Experimental
36 yuliu625/Yu-Deep-Learning-Toolkit

A versatile deep learning toolkit providing reusable components for common...

21
Experimental
37 lkopf/prism

[NeurIPS 2025] PRISM is a multi-concept feature description framework which...

21
Experimental
38 Sahilrajveer/reasonbench

📊 Evaluate machine learning models with realistic benchmarks that offer a...

14
Experimental
39 katha-ai/VELOCITI

VELOCITI Benchmark Evaluation and Visualisation Code

14
Experimental
40 modelbench/modelbench

A tool for reproducibly benchmarking machine learning models.

11
Experimental

Comparisons in this category