ustc-sunny/Awsome-RAG-LLM-Inference-System
A survey about LLM inference with RAG on limited resource from the system aspect, etc.
This project offers a comprehensive overview of techniques to build efficient Retrieval-Augmented Generation (RAG) systems, especially for scenarios with limited computing resources like GPUs. It compiles research on optimizing how Large Language Models (LLMs) find and use information, covering methods to speed up vector searches and manage data effectively. Developers and system architects involved in deploying RAG-based AI applications will find this useful for designing faster, more resource-efficient systems.
Use this if you are a system architect or engineer looking for advanced techniques to optimize the performance and resource usage of large-scale Retrieval-Augmented Generation (RAG) systems.
Not ideal if you are a business user looking for a pre-built RAG application or a data scientist focused solely on model training rather than system-level inference optimization.
Stars
8
Forks
1
Language
—
License
—
Category
Last pushed
Mar 13, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/ustc-sunny/Awsome-RAG-LLM-Inference-System"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
LearningCircuit/local-deep-research
Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports...
NVIDIA-AI-Blueprints/rag
This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented...
Denis2054/RAG-Driven-Generative-AI
This repository provides programs to build Retrieval Augmented Generation (RAG) code for...
hienhayho/rag-colls
Collection of recent advanced RAG techniques.
jeremiahbohr/literature-mapper
Transform academic PDFs into a Knowledge Graph with typed claims, temporal analysis,...