reader and scraping-agent-ai

These are complements: vakra-dev/reader provides the core web scraping and markdown cleaning infrastructure that hmshb/scraping-agent-ai wraps with agentic orchestration (LangGraph, Anthropic) to automate intelligent extraction workflows.

reader
55
Established
scraping-agent-ai
37
Emerging
Maintenance 10/25
Adoption 10/25
Maturity 22/25
Community 13/25
Maintenance 0/25
Adoption 6/25
Maturity 16/25
Community 15/25
Stars: 474
Forks: 32
Downloads:
Commits (30d): 0
Language: TypeScript
License: Apache-2.0
Stars: 16
Forks: 5
Downloads:
Commits (30d): 0
Language: Python
License: MIT
No risk flags
Stale 6m No Package No Dependents

About reader

vakra-dev/reader

Open-source, production-grade web scraping engine built for LLMs. Scrape and crawl the entire web, clean markdown, ready for your agents.

This project helps developers gather clean, structured web content for AI models and agents. It takes website URLs or entire domains as input, intelligently navigates complex sites, bypasses common anti-bot measures, and outputs cleaned content in markdown or HTML. It's designed for developers building applications that need reliable, large-scale web data.

AI development web data collection agent training data content extraction data pipeline

About scraping-agent-ai

hmshb/scraping-agent-ai

AI-powered web scraping agent built with LangGraph, LangSmith, Firecrawl, and Anthropic AI. Automates intelligent crawling, structured data extraction, and LLM-powered content formatting. Efficiently handles anti-bot mechanisms, error recovery, and batch processing. 🚀

This tool helps businesses and researchers automatically gather structured information from websites. You provide a list of URLs, and it intelligently navigates, bypasses common website defenses, extracts relevant content, and delivers it in a clean, organized format like JSON or Markdown. It's ideal for market analysts, competitive intelligence professionals, and data journalists who need to collect data at scale.

market-research competitive-intelligence lead-generation data-journalism content-aggregation

Scores updated daily from GitHub, PyPI, and npm data. How scores work