zhihu/cuBERT

Fast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKL

47
/ 100
Emerging

This project dramatically speeds up the process of analyzing text using BERT models. It takes raw text inputs and quickly outputs classifications (like sentiment or topic), extracted features, or pooled representations, making text analysis much faster. This is ideal for developers and MLOps engineers who need to deploy and run BERT-based natural language processing applications at scale, without the overhead of larger machine learning frameworks.

549 stars. No commits in the last 6 months.

Use this if you need to perform BERT inference with significantly reduced latency and higher throughput, especially in production environments where speed and efficiency are critical for text processing tasks.

Not ideal if you need to train BERT models, perform tasks beyond standard BERT inference, or if your application does not involve high-volume, performance-critical text analysis.

Natural Language Processing Text Analytics Machine Learning Inference Deep Learning Deployment High-Performance Computing
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 21 / 25

How are scores calculated?

Stars

549

Forks

84

Language

C++

License

MIT

Last pushed

Nov 18, 2020

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/zhihu/cuBERT"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.