VikParuchuri/textbook_quality
Generate textbook-quality synthetic LLM pretraining data
This project helps researchers and educators quickly create high-quality, comprehensive course materials and educational content. You provide a general subject or specific topics, and it generates extensive, well-structured "textbooks" for those areas. The primary users are individuals who need to develop large volumes of educational content without starting from scratch.
509 stars. No commits in the last 6 months.
Use this if you need to generate large amounts of detailed, high-quality course material or pre-training data for language models on a specific subject, quickly and efficiently.
Not ideal if you need short, conversational content or if you require highly creative, non-factual writing.
Stars
509
Forks
49
Language
Python
License
MIT
Category
Last pushed
Oct 19, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/VikParuchuri/textbook_quality"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
dmanuel64/codablellm
A framework for creating and curating high-quality code datasets tailored for large language models
BhabhaAI/dataformer
Solving data for LLMs - Create quality synthetic datasets!
BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models
This repository contains the source code and synthetic datasets used in the research on scam...
iiis-ai/TemplateMath
[ICLR 2025 DATA-FM] Training and Evaluating Language Models with Template-based Data Generation...
MichiganNLP/depression_synthetic_data
Can LMs generate useful synthetic data for the mental health domain?