OceanPresentChao/llm-corpus
从零搭建大模型知识库(Build LLM RAG Corpus from scratch)
This project helps you build a custom knowledge base for large language models (LLMs) from scratch. It takes your Chinese text documents, processes them, converts them into a format that LLMs can understand, and stores them in a searchable database. The output is a functional chatbot that can answer questions using the information in your specific documents. This is for developers, researchers, or data scientists looking to create tailored LLM applications for specific domains.
No commits in the last 6 months.
Use this if you need to create a specialized chatbot or question-answering system that uses your own collection of documents, rather than general internet knowledge.
Not ideal if you are a non-technical user looking for a ready-to-use application without any coding or model setup.
Stars
86
Forks
9
Language
Python
License
—
Category
Last pushed
Oct 23, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/OceanPresentChao/llm-corpus"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
yichuan-w/LEANN
[MLsys2026]: RAG on Everything with LEANN. Enjoy 97% storage savings while running a fast,...
byerlikaya/SmartRAG
Multi-Modal RAG for .NET — query databases, documents, images and audio in natural language....
aws-samples/layout-aware-document-processing-and-retrieval-augmented-generation
Advanced document extraction and chunking techniques for retrieval augmented generation that is...
sourangshupal/simple-rag-langchain
Exploring the Basics of Langchain
sion42x/llama-index-milvus-example
Open AI APIs with Llama Index and Milvus Vector DB for Retrieval Augmented Generation (RAG) testing