FunnySaltyFish/bilibili_comments_crawl
基于 B 站评论区数据构建大语言模型训练用对话数据集
This project helps content creators, social media analysts, or market researchers understand audience engagement on Bilibili videos. It takes a Bilibili video's ID and your login credentials to collect all associated comments and replies. The output is a structured conversation dataset, revealing how users interact and form discussion threads. This is for anyone interested in deep-diving into natural, multi-turn conversations from online video communities.
No commits in the last 6 months.
Use this if you need to gather authentic, multi-turn Chinese dialogue data from Bilibili video comment sections for qualitative analysis or to train conversational AI models.
Not ideal if you need a general-purpose web scraper for various websites, or if you require data beyond Bilibili comment threads.
Stars
59
Forks
4
Language
Python
License
—
Category
Last pushed
Dec 16, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/FunnySaltyFish/bilibili_comments_crawl"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
AI-Planning/l2p
Library for LLM-driven action model acquisition via natural language
datawhalechina/self-llm
《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
microsoft/LMOps
General technology for enabling AI capabilities w/ LLMs and MLLMs
theaniketgiri/create-llm
The fastest way to build and start training your own LLM. CLI tool that scaffolds...
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)