ustc-sunny/Awsome-RAG-LLM-Inference-System

A survey about LLM inference with RAG on limited resource from the system aspect, etc.

/ 100

Experimental

This project offers a comprehensive overview of techniques to build efficient Retrieval-Augmented Generation (RAG) systems, especially for scenarios with limited computing resources like GPUs. It compiles research on optimizing how Large Language Models (LLMs) find and use information, covering methods to speed up vector searches and manage data effectively. Developers and system architects involved in deploying RAG-based AI applications will find this useful for designing faster, more resource-efficient systems.

Use this if you are a system architect or engineer looking for advanced techniques to optimize the performance and resource usage of large-scale Retrieval-Augmented Generation (RAG) systems.

Not ideal if you are a business user looking for a pre-built RAG application or a data scientist focused solely on model training rather than system-level inference optimization.

AI-system-design LLM-deployment resource-optimization vector-database inference-engineering

No License No Package No Dependents

Maintenance 10 / 25

Adoption 4 / 25

Maturity 7 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

—

License

—

Higher-rated alternatives

LearningCircuit/local-deep-research

Local Deep Research achieves ~95% on SimpleQA benchmark (tested with GPT-4.1-mini). Supports...

NVIDIA-AI-Blueprints/rag

This NVIDIA RAG blueprint serves as a reference solution for a foundational Retrieval Augmented...

Denis2054/RAG-Driven-Generative-AI

This repository provides programs to build Retrieval Augmented Generation (RAG) code for...

hienhayho/rag-colls

Collection of recent advanced RAG techniques.

jeremiahbohr/literature-mapper

Transform academic PDFs into a Knowledge Graph with typed claims, temporal analysis,...

Explore RAG Tools

All categories Trending RAG directory Insights