cxcscmu/Montessori-Instruct

Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]

34
/ 100
Emerging

This project helps machine learning practitioners generate high-quality training data specifically designed to improve the learning process of smaller language models. You input an existing dataset and choose a 'teacher' language model and a 'student' language model. The output is a refined dataset that allows the student model to learn more effectively, ultimately leading to better performance in instruction-following tasks.

No commits in the last 6 months.

Use this if you need to create specialized training datasets that are optimally suited for a smaller language model to learn specific instruction-following behaviors.

Not ideal if you are not working with large language models or do not require fine-tuned control over data synthesis for student model improvement.

AI Training Data Large Language Model Fine-tuning Model Optimization Instruction Following Machine Learning Engineering
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 16 / 25
Community 10 / 25

How are scores calculated?

Stars

50

Forks

5

Language

Python

License

MIT

Last pushed

Jan 24, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/cxcscmu/Montessori-Instruct"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.