Tsujimar/tsuki-wscp

Web scraper for AI/ML training

29
/ 100
Experimental

This tool helps AI/ML practitioners gather large datasets from social media platforms like 4Chan, Reddit, and Twitter. You input your desired sources and it extracts posts or messages, storing them directly into your PostgreSQL database. It's designed for data scientists, machine learning engineers, and researchers who need extensive social media text for training their models.

No commits in the last 6 months.

Use this if you need to rapidly collect high volumes of social media text data from specific platforms to train your AI or machine learning models.

Not ideal if you need to scrape data from websites other than the supported social media platforms, or if you prefer a tool with a graphical interface.

AI-dataset-collection social-media-intelligence machine-learning-data natural-language-processing data-acquisition
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 6 / 25

How are scores calculated?

Stars

37

Forks

2

Language

Python

License

MIT

Last pushed

Aug 04, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/Tsujimar/tsuki-wscp"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.