mlibre/Clean-Web-Scraper
A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖
Stars
3
Forks
—
Language
JavaScript
License
—
Category
Last pushed
Oct 25, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mlibre/Clean-Web-Scraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
carlosplanchon/spidercreator
Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of...
raznem/parsera
Lightweight library for scraping web-sites with LLMs
rednafi/html-to-text
Extract pure text from any webpage
supadata-ai/js
Official TypeScript/JavaScript SDK for the Supadata API.
yeahhe365/JustSearch
基于 Playwright 的自主 AI 搜索智能体。支持迭代式任务规划、深度网页爬取,以及带引用来源的多源知识整合。