ustc-sunny/Awsome-RAG-LLM-Inference-System

A survey about LLM inference with RAG on limited resource from the system aspect, etc.

29
/ 100
Experimental

This project offers a comprehensive overview of techniques to build efficient Retrieval-Augmented Generation (RAG) systems, especially for scenarios with limited computing resources like GPUs. It compiles research on optimizing how Large Language Models (LLMs) find and use information, covering methods to speed up vector searches and manage data effectively. Developers and system architects involved in deploying RAG-based AI applications will find this useful for designing faster, more resource-efficient systems.

Use this if you are a system architect or engineer looking for advanced techniques to optimize the performance and resource usage of large-scale Retrieval-Augmented Generation (RAG) systems.

Not ideal if you are a business user looking for a pre-built RAG application or a data scientist focused solely on model training rather than system-level inference optimization.

AI-system-design LLM-deployment resource-optimization vector-database inference-engineering
No License No Package No Dependents
Maintenance 10 / 25
Adoption 4 / 25
Maturity 7 / 25
Community 8 / 25

How are scores calculated?

Stars

8

Forks

1

Language

License

Last pushed

Mar 13, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/ustc-sunny/Awsome-RAG-LLM-Inference-System"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.