OceanPresentChao/llm-corpus

从零搭建大模型知识库(Build LLM RAG Corpus from scratch)

/ 100

Experimental

This project helps you build a custom knowledge base for large language models (LLMs) from scratch. It takes your Chinese text documents, processes them, converts them into a format that LLMs can understand, and stores them in a searchable database. The output is a functional chatbot that can answer questions using the information in your specific documents. This is for developers, researchers, or data scientists looking to create tailored LLM applications for specific domains.

No commits in the last 6 months.

Use this if you need to create a specialized chatbot or question-answering system that uses your own collection of documents, rather than general internet knowledge.

Not ideal if you are a non-technical user looking for a ready-to-use application without any coding or model setup.

knowledge-base-creation LLM-customization chatbot-development information-retrieval NLP-engineering

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 8 / 25

Community 12 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

yichuan-w/LEANN

[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast,...

byerlikaya/SmartRAG

Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language....

aws-samples/layout-aware-document-processing-and-retrieval-augmented-generation

Advanced document extraction and chunking techniques for retrieval augmented generation that is...

sourangshupal/simple-rag-langchain

Exploring the Basics of Langchain

sion42x/llama-index-milvus-example

Open AI APIs with Llama Index and Milvus Vector DB for Retrieval Augmented Generation (RAG) testing

Explore Vector Databases

All categories Trending Vector Database directory Insights