OrderLab/TrainCheck
An Observability Framework for AI Training
This tool helps AI engineers and researchers ensure the reliability of their deep learning model training. It takes your existing training script and automatically learns what a 'healthy' training run looks like. Then, it monitors new runs in real-time, catching silent bugs or hardware issues that standard metrics might miss, outputting alerts when deviations occur.
Available on PyPI.
Use this if you need to proactively identify subtle errors in your AI model training that can waste significant GPU resources and time.
Not ideal if you are looking for a tool to evaluate model performance after training or for hyperparameter optimization.
Stars
66
Forks
3
Language
Python
License
—
Category
Last pushed
Mar 08, 2026
Commits (30d)
0
Dependencies
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/OrderLab/TrainCheck"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
modelscope/modelscope
ModelScope: bring the notion of Model-as-a-Service to life.
basetenlabs/truss
The simplest way to serve AI/ML models in production
Lightning-AI/LitServe
A minimal Python framework for building custom AI inference servers with full control over...
deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
tensorflow/serving
A flexible, high-performance serving system for machine learning models