bytewife/substack_scraper
A scraper for Substack article text content
This tool helps you gather public article content from multiple Substack blogs. You provide the URLs of the Substack blogs, and it downloads their posts as individual text files. This is useful for researchers, content analysts, or anyone looking to collect public Substack content for analysis or training data.
No commits in the last 6 months.
Use this if you need to quickly extract all publicly available text content from several Substack newsletters into a collection of simple text files.
Not ideal if you need to access content from subscriber-only articles, as it will only capture the publicly visible, truncated portions.
Stars
32
Forks
6
Language
Rust
License
MIT
Category
Last pushed
Oct 05, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/bytewife/substack_scraper"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Altimis/Scweet
A simple and unlimited twitter scraper : scrape tweets, likes, retweets, following, followers,...
lexiforest/curl_cffi
Python binding for curl-impersonate fork via cffi. A http client that can impersonate browser...
plabayo/rama
modular service framework to move and transform network packets
scrapinghub/spidermon
Scrapy Extension for monitoring spiders execution.