Retrieval-Augmented Generation NLP Tools

Tools and frameworks implementing RAG systems that combine document retrieval with LLM-based generation for knowledge-base question answering, semantic search, and context-aware responses. Does NOT include general information retrieval, semantic search without LLM integration, or knowledge graph construction without retrieval components.

There are 21 retrieval-augmented generation tools tracked. 3 score above 50 (established tier). The highest-rated is web-arena-x/webarena at 55/100 with 1,398 stars.

Get all 21 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=nlp&subcategory=retrieval-augmented-generation&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 web-arena-x/webarena

Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"

55
Established
2 nabeelxy/syara

SYARA: Super YARA Rules for GenAI Era

51
Established
3 princeton-nlp/WebShop

[NeurIPS 2022] đź›’WebShop: Towards Scalable Real-World Web Interaction with...

50
Established
4 X-LANCE/Mobile-Env

A Universal Platform for Training and Evaluation of Mobile Interaction

37
Emerging
5 shbernal/pdfanki

Create Anki decks from PDF/EPUB files using NLP with LLMs.

35
Emerging
6 princeton-nlp/lwm

We develop world models that can be adapted with natural language....

33
Emerging
7 zimingyou01/DatawiseAgent

[EMNLP 2025] DatawiseAgent: A Notebook-Centric LLM Agent Framework for...

32
Emerging
8 dinhanhx/cakewalk-rag

A very simple RAG implementation

30
Emerging
9 jitinkrishnan/NASA-SE

A Virtual Assistant for NASA's Systems Engineers (AAAI-MAKE '19 '20)

27
Experimental
10 khoj-ai/lantern

Lantern manages a waitlist for Khoj. It used to be a lot more, but now it's simple!

24
Experimental
11 poojakira/semantic-rag-engine

Production-grade RAG pipeline with LangChain, ChromaDB, and semantic search....

22
Experimental
12 DominicMukilan/ithkuil-grammar-copilot

RAG+validation system demonstrating LLM accuracy improvement from 65% to 95%

20
Experimental
13 antononcube/Python-LLMTextualAnswer

Python package for finding textual answers via LLMs.

19
Experimental
14 KrishnaNarkhede/NLP-DataProcessor

LLM Based Data Analyst Assistant : AI-powered data analysis platform that...

18
Experimental
15 pahul0303/llassist

A tool for processing and analyzing research articles using NLP and Large...

18
Experimental
16 wyu97/RACo

Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified...

18
Experimental
17 isurulkh/Small-Language-Model

SmallDisMed is a fine-tuned GPT-2 model pre-trained on a medical dataset for...

18
Experimental
18 hager51/Chatbot

Question Answering System For COVID-19 Questions Using NLP Techniques

16
Experimental
19 Dhanunjaya18/AI-Classroom-Doubt-Detector

Developed an AI-powered Classroom Doubt Detection system that analyzes...

14
Experimental
20 MathieuDesponds/Information-extraction-in-official-documents-using-LLMs

Assessed MistralAI-7B capabilities for document information extraction while...

10
Experimental
21 hakancangunerli/CitationLLM

An LLM that can retrieve and provide LLM answers with Citations at another page.

10
Experimental