VikParuchuri/textbook_quality

Generate textbook-quality synthetic LLM pretraining data

/ 100

Emerging

This project helps researchers and educators quickly create high-quality, comprehensive course materials and educational content. You provide a general subject or specific topics, and it generates extensive, well-structured "textbooks" for those areas. The primary users are individuals who need to develop large volumes of educational content without starting from scratch.

509 stars. No commits in the last 6 months.

Use this if you need to generate large amounts of detailed, high-quality course material or pre-training data for language models on a specific subject, quickly and efficiently.

Not ideal if you need short, conversational content or if you require highly creative, non-factual writing.

education-content-creation course-development curriculum-design technical-writing elearning-materials

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 17 / 25

How are scores calculated?

Stars

509

Forks

Language

Python

License

MIT

Related models

dmanuel64/codablellm

A framework for creating and curating high-quality code datasets tailored for large language models

BhabhaAI/dataformer

Solving data for LLMs - Create quality synthetic datasets!

BothBosu/Synthetic-Data-for-Scam-Detection-Leveraging-LLMs-to-Train-Deep-Learning-Models

This repository contains the source code and synthetic datasets used in the research on scam...

iiis-ai/TemplateMath

[ICLR 2025 DATA-FM] Training and Evaluating Language Models with Template-based Data Generation...

MichiganNLP/depression_synthetic_data

Can LMs generate useful synthetic data for the mental health domain?

Explore Transformer Models

All categories Trending Transformer directory Insights