andrewkchan/deepseek.cpp

CPU inference for the DeepSeek family of large language models in C++

/ 100

Emerging

This project helps developers test and experiment with DeepSeek family large language models directly on their computer's CPU, without needing specialized hardware. It takes DeepSeek model weights as input and can generate text completions, calculate perplexity, or run in interactive mode. This is useful for engineers or researchers who want to deeply understand and optimize the performance of DeepSeek models on standard CPU infrastructure.

315 stars. No commits in the last 6 months.

Use this if you are a developer or researcher focused on CPU-only inference for DeepSeek models and want a hackable, minimalist codebase for experimentation.

Not ideal if you need a production-ready, high-performance inference solution that supports various LLM architectures or GPU acceleration.

LLM-development model-inference CPU-optimization DeepSeek-models AI-research

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 16 / 25

How are scores calculated?

Stars

315

Forks

Language

C++

License

—

Higher-rated alternatives

vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

sgl-project/sglang

SGLang is a high-performance serving framework for large language models and multimodal models.

alibaba/MNN

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...

xorbitsai/inference

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...

tensorzero/tensorzero

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...

Explore Transformer Models

All categories Trending Transformer directory Insights