kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for DocumentĀ Q&A
This project helps teams who need to extract specific information from long documents, like annual reports or legal filings, by asking questions in plain language. You input your documents and your questions, and it provides direct answers. This is ideal for analysts, researchers, or anyone handling sensitive information that can't be shared with external AI services.
974 stars. No commits in the last 6 months.
Use this if you need to run a question-answering system on your own private documents without relying on external cloud-based AI services, especially due to data privacy or cost concerns.
Not ideal if you're comfortable using commercial AI services like OpenAI's GPT-4 or if you need to process extremely large volumes of data very quickly on high-end GPUs.
Stars
974
Forks
207
Language
Python
License
MIT
Category
Last pushed
Nov 06, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/kennethleungty/Llama-2-Open-Source-LLM-CPU-Inference"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
sgl-project/sglang
SGLang is a high-performance serving framework for large language models and multimodal models.
alibaba/MNN
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering...
xorbitsai/inference
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source,...
tensorzero/tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM...