yusuzech/r-web-scraping-cheat-sheet
Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.
This is a comprehensive guide for anyone looking to extract data from websites using R. It details how to use `rvest`, `httr`, and `RSelenium` to turn web page content into structured data like lists or data frames. Data scientists, researchers, or analysts who work with R and need to gather information directly from web sources will find this useful.
399 stars. No commits in the last 6 months.
Use this if you are an R user and need to systematically collect data from websites, ranging from simple static pages to complex, JavaScript-heavy sites or those requiring login.
Not ideal if you prefer Python for web scraping, or if you only need a very basic one-off data pull that can be done manually.
Stars
399
Forks
101
Language
R
License
MIT
Category
Last pushed
Dec 20, 2022
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/perception/yusuzech/r-web-scraping-cheat-sheet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
seleniumbase/SeleniumBase
APIs for browser automation, testing, and bypassing bot-detection.
apify/crawlee-python
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers....
intoli/user-agents
A JavaScript library for generating random user agents with data that's updated daily.
apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In...
Kaliiiiiiiiii-Vinyzu/patchright
Undetected version of the Playwright testing and automation library.