raintree-technology/docpull
Crawl any website and convert it to clean, AI-ready Markdown β async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output
This tool helps you quickly gather information from any website, like product documentation or blog posts, and convert it into clean, organized Markdown files. You input a website URL, and it provides a collection of text files ready for various uses, especially for training AI models or building knowledge bases. Content managers, researchers, or anyone building AI applications can use this to easily extract and format web content.
Available on PyPI.
Use this if you need to extract structured, text-based content from websites for purposes like AI training, content archiving, or creating searchable knowledge bases.
Not ideal if you primarily need to extract images, videos, or highly interactive application data, as its main focus is on text content for AI readiness.
Stars
20
Forks
1
Language
Python
License
MIT
Category
Last pushed
Feb 06, 2026
Commits (30d)
0
Dependencies
10
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/raintree-technology/docpull"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...