crwlrsoft/robots-txt

Robots Exclusion Standard/Protocol Parser for Web Crawling/Scraping

/ 100

Emerging

When building a web crawler or scraper, this helps you interpret a website's robots.txt file. You feed it the rules from a website's robots.txt and your crawler's identifier, and it tells you which parts of the site your crawler is allowed to visit. Web scraping developers and data engineers use this to ensure their crawlers respect website access policies.

Use this if you are programming a web crawler and need to automatically determine if your crawler is permitted to access specific web pages or directories.

Not ideal if you are manually checking robots.txt files or need a tool for general web browsing, as this is specifically for automated crawler programs.

web-scraping web-crawling data-extraction bot-development web-automation

No Package No Dependents

Maintenance 6 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

PHP

License

MIT

Featured in

Giving AI Agents Eyes: Browser Automation in 2026

Higher-rated alternatives

scrapy/scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.

Altimis/Scweet

A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...

lexiforest/curl_cffi

Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...

plabayo/rama

modular service framework to move and transform network packets

scrapinghub/spidermon

Scrapy Extension for monitoring spiders execution.

Explore Perception Tools

All categories Trending Perception directory Insights