xn0tsa/Web2LLM

An advanced Python tool for extracting data from websites, cleaning the content, and converting it to high-quality Markdown for optimal use by LLM systems.

/ 100

Experimental

When you need to feed up-to-date information from websites to your AI tools or large language models, this project helps by intelligently extracting the core content, removing irrelevant clutter like navigation and ads. It takes any web page URL and outputs a clean, structured Markdown file. This is ideal for AI trainers, data scientists, or content managers who build and maintain custom knowledge bases for AI.

No commits in the last 6 months.

Use this if you need to reliably convert web pages into a clean, token-efficient Markdown format specifically optimized for AI comprehension, avoiding irrelevant website elements.

Not ideal if you need to preserve the exact visual layout or every single element of a webpage, as it's designed to strip away non-essential components.

AI Knowledge Base LLM Data Preparation Content Curation Technical Documentation AI Training Data

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 8 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights