Pankaj3112/pluckr
Schema-first, self-healing HTML extraction powered by LLMs
This tool helps developers reliably extract structured data from HTML pages, even when website layouts change. You define the desired data structure using a Zod schema, provide the raw HTML, and it outputs the extracted data in a typed format. It's designed for developers building applications that need to pull specific information, like product details or article content, from web pages.
Use this if you need to programmatically extract structured information from websites and want to avoid the constant maintenance of traditional web scrapers that break with layout changes.
Not ideal if you're looking for a no-code solution or a general-purpose web crawler that indexes entire sites without specific data extraction goals.
Stars
8
Forks
—
Language
TypeScript
License
—
Category
Last pushed
Feb 24, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/Pankaj3112/pluckr"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
carlosplanchon/spidercreator
Automated web scraping spider generation using Browser Use and LLMs. Streamline the creation of...
raznem/parsera
Lightweight library for scraping web-sites with LLMs
rednafi/html-to-text
Extract pure text from any webpage
supadata-ai/js
Official TypeScript/JavaScript SDK for the Supadata API.
yeahhe365/JustSearch
基于 Playwright 的自主 AI 搜索智能体。支持迭代式任务规划、深度网页爬取,以及带引用来源的多源知识整合。