quic/efficient-transformers

This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transformers library) into inference-ready formats that run efficiently on Qualcomm Cloud AI 100 accelerators.

/ 100

Established

This library helps AI developers and machine learning engineers take pre-trained AI models from the HuggingFace hub and make them run very efficiently on Qualcomm Cloud AI 100 hardware. It takes models like large language models, vision models, or audio models and converts them into an optimized format for high-performance inference. This is for AI practitioners deploying models in production on Qualcomm's cloud accelerators.

Use this if you need to deploy various types of large AI models, including text, image, and audio, for efficient and high-performance inference on Qualcomm Cloud AI 100 accelerators.

Not ideal if you are not working with Qualcomm Cloud AI 100 hardware or if you only need to train models rather than optimize them for deployment.

AI deployment machine learning inference model optimization cloud AI accelerators large language models

No Package No Dependents

Maintenance 10 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 23 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Related models

ManuelSLemos/RabbitLLM

Run 70B+ LLMs on a single 4GB GPU — no quantization required.

alpa-projects/alpa

Training and serving large-scale neural networks with auto parallelization.

arm-education/Advanced-AI-Hardware-Software-Co-Design

Hands-on course materials for ML engineers to master extreme model quantization and on-device...

IST-DASLab/marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes...

deepreinforce-ai/CUDA-L2

CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning

Explore Transformer Models

All categories Trending Transformer directory Insights