biraj21/web-wanderer

A multi-threaded web crawler written in Python, utilizing ThreadPoolExecutor and Playwright to efficiently crawl dynamically rendered web pages and download them.

26
/ 100
Experimental

This tool helps you quickly gather content from many web pages on a website, even those built with modern interactive technologies like JavaScript. You provide a starting web address, and it downloads the visible content of linked pages, saving them into a designated folder on your computer. It's ideal for anyone who needs to collect website content for research, archiving, or analysis.

No commits in the last 6 months.

Use this if you need to download and save content from a website, including those that load information dynamically after the initial page view.

Not ideal if you only need to extract specific data fields rather than entire page content, or if you require advanced data parsing and structuring.

web-scraping content-archiving market-research competitive-analysis data-collection
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 4 / 25

How are scores calculated?

Stars

22

Forks

1

Language

Python

License

MIT

Last pushed

Nov 30, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/biraj21/web-wanderer"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.