Tsujimar/tsuki-wscp
Web scraper for AI/ML training
This tool helps AI/ML practitioners gather large datasets from social media platforms like 4Chan, Reddit, and Twitter. You input your desired sources and it extracts posts or messages, storing them directly into your PostgreSQL database. It's designed for data scientists, machine learning engineers, and researchers who need extensive social media text for training their models.
No commits in the last 6 months.
Use this if you need to rapidly collect high volumes of social media text data from specific platforms to train your AI or machine learning models.
Not ideal if you need to scrape data from websites other than the supported social media platforms, or if you prefer a tool with a graphical interface.
Stars
37
Forks
2
Language
Python
License
MIT
Category
Last pushed
Aug 04, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Tsujimar/tsuki-wscp"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
alirezamika/autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
YoongiKim/AutoCrawler
Google, Naver multiprocess image web crawler (Selenium)
machine-learning-apps/Issue-Label-Bot
Code For The Issue Label Bot, an App that automatically labels issues using machine learning,...
nuhmanpk/Webtrench
A powerful and easy-to-use web scrapper for collecting data from the web. Supports scraping of...
lorey/mlscraper
🤖 Scrape data from HTML websites automatically by just providing examples