Sriram-PR/doc-scraper
Go web crawler to scrape documentation sites and convert content to clean Markdown for LLM ingestion (RAG, training data).
This tool helps AI engineers, data scientists, and machine learning practitioners gather documentation from websites to feed into their Large Language Models (LLMs). You provide the URLs of the documentation sites and specific content areas, and it outputs clean, structured Markdown files. These files are perfect for improving your LLM's knowledge base or for training new models.
Use this if you need to reliably collect and convert online technical documentation into a clean Markdown format for use with LLMs or RAG systems.
Not ideal if you're looking to scrape arbitrary web pages for general data extraction or if your primary goal isn't LLM training or RAG.
Stars
86
Forks
8
Language
Go
License
Apache-2.0
Category
Last pushed
Feb 21, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/Sriram-PR/doc-scraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.