OpenBMB/BMInf

Efficient Inference for Big Models

/ 100

Emerging

This package helps machine learning engineers and researchers run very large language models, like those used for generating text or answering questions, on less powerful computer hardware. It takes your existing large language model and allows it to perform its tasks efficiently, even on a single consumer-grade GPU. The output is the same high-quality results from your large model, but with significantly reduced hardware requirements and improved speed.

587 stars. No commits in the last 6 months.

Use this if you need to deploy or experiment with extremely large pre-trained language models (10+ billion parameters) but are limited by GPU memory or want to achieve better performance on powerful GPUs.

Not ideal if you are working with smaller models that already fit comfortably within your GPU's memory or if you prefer not to modify your model's internal structure for optimization.

large-language-models NLP-deployment AI-inference-optimization natural-language-generation deep-learning-research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

587

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

NVIDIA/TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit...

mlcommons/inference

Reference implementations of MLPerf® inference benchmarks

mlcommons/training

Reference implementations of MLPerf® training benchmarks

datamade/usaddress

:us: a python library for parsing unstructured United States address strings into address components

GRAAL-Research/deepparse

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Explore ML Frameworks

All categories Trending ML Framework directory Insights