styfeng/TinyDialogues
Code & data for the EMNLP 2024 paper: Is Child-Directed Speech Effective Training Data for Language Models?
This project helps researchers and computational linguists explore how language models learn from speech directed at children versus adults. It provides tools to process child-directed speech and adult speech datasets, format them for training, and then train and evaluate language models like GPT-2 and RoBERTa on this data. The output helps understand the effectiveness of different types of linguistic input.
No commits in the last 6 months.
Use this if you are a computational linguist or cognitive scientist studying language acquisition and want to investigate how different speech environments impact the development of language models.
Not ideal if you are looking to train a general-purpose, production-ready language model or if you are not working with child-directed speech datasets.
Stars
12
Forks
5
Language
Python
License
MIT
Category
Last pushed
Oct 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/styfeng/TinyDialogues"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
worldbank/REaLTabFormer
A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and...
MagedSaeed/generate-sequences
A python package made to generate sequences (greedy and beam-search) from Pytorch (not...
tlkh/t2t-tuner
Convenient Text-to-Text Training for Transformers
NohTow/PPL-MCTS
Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through...
readme-generator/alreadyme-ai-serving
Serving large language model with transformers