mlibre/Clean-Web-Scraper

A Node.js web scraper that extracts clean, readable content from websites - perfect for AI/LLM training datasets. Features smart crawling, Mozilla Readability integration, and organized content storage 🤖

/ 100

Experimental

No License No Package No Dependents

Maintenance 6 / 25

Adoption 3 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

JavaScript

License

—

Category

llm-web-scraping

Last pushed

Oct 25, 2025

Commits (30d)

GitHub

LLM Web Scraping · 31 tools

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/mlibre/Clean-Web-Scraper"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

Higher-rated alternatives

carlosplanchon/spidercreator

Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of...

raznem/parsera

Lightweight library for scraping web-sites with LLMs

rednafi/html-to-text

Extract pure text from any webpage

supadata-ai/js

Official TypeScript/JavaScript SDK for the Supadata API.

yeahhe365/JustSearch

基于 Playwright 的自主 AI 搜索智能体。支持迭代式任务规划、深度网页爬取，以及带引用来源的多源知识整合。

Explore LLM Tools

All categories Trending LLM Tool directory Insights