cotesiito/flashtensors
🚀 Accelerate your AI projects with flashtensors, a fast inference engine that loads large models on a single GPU in under 2 seconds.
This application helps you quickly load and switch between large AI models on a single GPU. It takes your saved model files and rapidly moves them into your GPU's memory, allowing you to run predictions or analyses almost instantly. This is ideal for AI researchers, data scientists, or developers who frequently test or use multiple large models.
Use this if you need to rapidly load and swap between various large AI models on a single GPU without long wait times.
Not ideal if you primarily work with smaller models that load quickly or if you are using cloud-based GPU resources where model loading is managed differently.
Stars
10
Forks
1
Language
Python
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/cotesiito/flashtensors"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning