OpenBMB/BMInf
Efficient Inference for Big Models
This package helps machine learning engineers and researchers run very large language models, like those used for generating text or answering questions, on less powerful computer hardware. It takes your existing large language model and allows it to perform its tasks efficiently, even on a single consumer-grade GPU. The output is the same high-quality results from your large model, but with significantly reduced hardware requirements and improved speed.
587 stars. No commits in the last 6 months.
Use this if you need to deploy or experiment with extremely large pre-trained language models (10+ billion parameters) but are limited by GPU memory or want to achieve better performance on powerful GPUs.
Not ideal if you are working with smaller models that already fit comfortably within your GPU's memory or if you prefer not to modify your model's internal structure for optimization.
Stars
587
Forks
66
Language
Python
License
Apache-2.0
Category
Last pushed
Jan 24, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/OpenBMB/BMInf"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
NVIDIA/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...
mlcommons/inference
Reference implementations of MLPerf® inference benchmarks
mlcommons/training
Reference implementations of MLPerf® training benchmarks
datamade/usaddress
:us: a python library for parsing unstructured United States address strings into address components
GRAAL-Research/deepparse
Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning