malvads/mojo
Non sucking cross-platform extremely fast C++ crawler to convert entire websites into LLM readable data
Mojo helps AI practitioners and data engineers quickly gather vast amounts of web content to train their AI models or build knowledge bases. It takes a list of website URLs and automatically converts them into clean, structured Markdown files, ready for ingestion by large language models (LLMs) or retrieval-augmented generation (RAG) systems. This is ideal for anyone who needs high-quality, organized web data for AI applications.
Use this if you need to rapidly collect and clean large datasets from websites for AI model training or to power AI-driven knowledge bases.
Not ideal if you're looking for a general-purpose web scraper for personal use or for extracting highly specific data points from just a few pages.
Stars
12
Forks
1
Language
C++
License
MIT
Category
Last pushed
Feb 04, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/rag/malvads/mojo"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
any4ai/AnyCrawl
AnyCrawl π: A Node.js/TypeScript crawler that turns websites into LLM-ready data and extracts...
kreuzberg-dev/html-to-markdown
High performance and CommonMark compliant HTML to Markdown converter. Maintained by the...
ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
adbar/trafilatura
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping,...
paulpierre/markdown-crawler
A multithreaded πΈοΈ web crawler that recursively crawls a website and creates a π½ markdown file...