Web Scraping Tools ML Frameworks

Tools and frameworks for automatically extracting data from websites through web scraping, crawling, and HTML parsing. Does NOT include data cleaning libraries, NLP analysis tools, or downstream ML applications that use scraped data.

There are 39 web scraping tools frameworks tracked. 2 score above 50 (established tier). The highest-rated is alirezamika/autoscraper at 57/100 with 7,122 stars.

Get all 39 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ml-frameworks&subcategory=web-scraping-tools&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Framework Score Tier
1 alirezamika/autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

57
Established
2 YoongiKim/AutoCrawler

Google, Naver multiprocess image web crawler (Selenium)

51
Established
3 machine-learning-apps/Issue-Label-Bot

Code For The Issue Label Bot, an App that automatically labels issues using...

49
Emerging
4 nuhmanpk/Webtrench

A powerful and easy-to-use web scrapper for collecting data from the web....

48
Emerging
5 lorey/mlscraper

🤖 Scrape data from HTML websites automatically by just providing examples

44
Emerging
6 shaohua0116/ICLR2020-OpenReviewData

Script that crawls meta data from ICLR OpenReview webpage. Tutorials on...

41
Emerging
7 tal95shah/OLX_Scraper

:radio: An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted...

38
Emerging
8 gridaco/figma-archives

Figma Files Scraper for Research & Studies

37
Emerging
9 garysieling/video-crawler

Crawl websites for videos from Youtube, Vimeo, Soundcloud, etc

36
Emerging
10 Tuhin-thinks/instagram-unfollower-tracker-meerkit

Analyze Instagram followers, find unfollowers, automate follow/unfollow, and...

35
Emerging
11 NYX-VORAX/lightning-image-scraper

⚡ Lightning-fast Python image scraper | Download 10K+ images/min from any...

35
Emerging
12 DevGlitch/botwizer

Social media AI bot using computer vision to imitate human behaviors. Final...

34
Emerging
13 ganeshkavhar/Web-Scraping-in-python

ganesh kavhar python project

31
Emerging
14 udit-git/Python-WebScraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

29
Experimental
15 Tsujimar/tsuki-wscp

Web scraper for AI/ML training

29
Experimental
16 b1t0nese/MacLearn

Программа, которая за считанные минуты соберёт для вас качественный датасет...

27
Experimental
17 dpuentel/github-issues-labeller-cohere

This is a GitHub issue labeller. Insert the url of a repository and using...

23
Experimental
18 Eshtiaque/Multi-Agent-Instagram-bot

Instagram-bot

22
Experimental
19 OwenOrcan/YiraBot-Crawler

YiraBot: Simplifying Web Scraping for All. A user-friendly tool for...

22
Experimental
20 ismailazdad/stackoverflowTags

flask website that automatically assigns multiple relevant tags to a...

22
Experimental
21 zt8812/lightning-image-scraper

🖼️ Download thousands of images fast with asynchronous scraping and...

22
Experimental
22 YafetGetu/Data_scraper-from-jiji-ethiopia

A professional web scraping tool for extracting product listings from the...

21
Experimental
23 jigusp/urls-le

🔗 Extract thousands of URLs per second from various formats like HTML, JSON,...

21
Experimental
24 ArtificialOSS/WebCrawl

Crawls the web to generate a huge dataset for training

19
Experimental
25 BlazeInferno64/ScrapyPy

ScrapyPy is a free, open-source, and powerful web scraping tool that...

19
Experimental
26 MaximumOverflow/Philia

An easy to use imageboard scraper.

18
Experimental
27 Decodo/soundcloud-scraper

Scraper for SoundCloud that extracts audio metadata and download URLs using...

17
Experimental
28 Gulilil/nusava

Development of Social media bot in Instagram, Nusava.

15
Experimental
29 gabryelvieiramusico/instagram-content-intelligence-pro

📊 Transform Instagram content into actionable insights with AI-driven...

14
Experimental
30 gmk418/Python-web-scraping

🔍 Discover Python web scraping techniques, libraries, and examples to...

14
Experimental
31 bhavanaaroy-sketch/AI-Code-Complexity-Analyzer

AI-based tool to analyze code complexity using Python and Streamlit

14
Experimental
32 Mrsultan7890/crl

CRL Pure Python crawler The Semantic Web Crawler For AI & Security

14
Experimental
33 bright-data-de/web-scraping-for-machine-learning

Scrapen Sie Webdaten für maschinelles Lernen, richten Sie ETL-Pipelines ein...

13
Experimental
34 ozgesadet/silver-invention

AI based tender finding

13
Experimental
35 Epsilon-Ventures/document-similarity-frontend

Major Project Frontend

13
Experimental
36 mate3424/easy-zoot-data-scraper

🛍️ Scrape structured fashion product data effortlessly from multiple...

13
Experimental
37 basilcherian42/Insta-Insights

Insta-Insights: A powerful tool to identify fake accounts on Instagram and...

11
Experimental
38 Billie-LS/Scraping-ML-Deep

Scraping web for ML and Deep Learning applications

11
Experimental
39 memosasoft/wiki-nerd-1.0

Scraper for Wikipedia self learning project. I was interested on how we read...

11
Experimental

Comparisons in this category