quic/efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.
This library helps AI developers and machine learning engineers take pre-trained AI models from the HuggingFace hub and make them run very efficiently on Qualcomm Cloud AI 100 hardware. It takes models like large language models, vision models, or audio models and converts them into an optimized format for high-performance inference. This is for AI practitioners deploying models in production on Qualcomm's cloud accelerators.
Use this if you need to deploy various types of large AI models, including text, image, and audio, for efficient and high-performance inference on Qualcomm Cloud AI 100 accelerators.
Not ideal if you are not working with Qualcomm Cloud AI 100 hardware or if you only need to train models rather than optimize them for deployment.
Stars
87
Forks
75
Language
Python
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/quic/efficient-transformers"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
ManuelSLemos/RabbitLLM
Run 70B+ LLMs on a single 4GB GPU — no quantization required.
alpa-projects/alpa
Training and serving large-scale neural networks with auto parallelization.
arm-education/Advanced-AI-Hardware-Software-Co-Design
Hands-on course materials for ML engineers to master extreme model quantization and on-device...
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes...
deepreinforce-ai/CUDA-L2
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning