phase3dev/sitemap-extract

Processes XML sitemaps and extracts URLs. Includes features such as support for both plain XML and compressed XML files, multiple input sources, protection against anti-bot measures, multi-threading, and automatic processing of nested sitemaps.

46
/ 100
Emerging

This tool helps SEO specialists, market researchers, or data analysts gather extensive lists of URLs from websites. You provide it with a sitemap URL (or a list of them), and it outputs a clean, comprehensive list of all discovered URLs, even from large or complex sitemaps. It's designed for users who need to extract website structure without being blocked by anti-bot systems.

Use this if you need to reliably extract all URLs from a website's sitemap, especially for large sites or those with anti-bot measures, and want the flexibility to use proxies or customize request behavior.

Not ideal if you only need to check a single, small sitemap occasionally and don't require advanced evasion techniques or high-volume processing.

SEO-auditing market-research competitive-analysis website-crawling data-extraction
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

11

Forks

4

Language

Python

License

MIT

Category

scraper

Last pushed

Mar 12, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/perception/phase3dev/sitemap-extract"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.