raintree-technology/docpull

Crawl any website and convert it to clean, AI-ready Markdown — async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output

/ 100

Emerging

This tool helps you quickly gather information from any website, like product documentation or blog posts, and convert it into clean, organized Markdown files. You input a website URL, and it provides a collection of text files ready for various uses, especially for training AI models or building knowledge bases. Content managers, researchers, or anyone building AI applications can use this to easily extract and format web content.

Available on PyPI.

Use this if you need to extract structured, text-based content from websites for purposes like AI training, content archiving, or creating searchable knowledge bases.

Not ideal if you primarily need to extract images, videos, or highly interactive application data, as its main focus is on text content for AI readiness.

content-management knowledge-base-creation AI-data-preparation research-data-collection documentation-archiving

Maintenance 10 / 25

Adoption 6 / 25

Maturity 22 / 25

Community 5 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

any4ai/AnyCrawl

AnyCrawl 🚀: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...

kreuzberg-dev/html-to-markdown

High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...

ScrapeGraphAI/Scrapegraph-ai

Python scraper based on AI

adbar/trafilatura

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...

paulpierre/markdown-crawler

A multithreaded 🕸️ web crawler that recursively crawls a website and creates a 🔽 markdown file...

Explore RAG Tools

All categories Trending RAG directory Insights