raintree-technology/docpull

Crawl any website and convert it to clean, AI-ready Markdown β€” async Python CLI with MCP support, crawl profiles, caching, and RAG-optimized output

43
/ 100
Emerging

This tool helps you quickly gather information from any website, like product documentation or blog posts, and convert it into clean, organized Markdown files. You input a website URL, and it provides a collection of text files ready for various uses, especially for training AI models or building knowledge bases. Content managers, researchers, or anyone building AI applications can use this to easily extract and format web content.

Available on PyPI.

Use this if you need to extract structured, text-based content from websites for purposes like AI training, content archiving, or creating searchable knowledge bases.

Not ideal if you primarily need to extract images, videos, or highly interactive application data, as its main focus is on text content for AI readiness.

content-management knowledge-base-creation AI-data-preparation research-data-collection documentation-archiving
Maintenance 10 / 25
Adoption 6 / 25
Maturity 22 / 25
Community 5 / 25

How are scores calculated?

Stars

20

Forks

1

Language

Python

License

MIT

Last pushed

Feb 06, 2026

Commits (30d)

0

Dependencies

10

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/raintree-technology/docpull"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.