MaxLSB/le-carnet

LeCarnet is a 2 M+ corpus of simple French stories

35
/ 100
Emerging

LeCarnet is a collection of over 2 million simple French children's stories designed for training and evaluating small language models (SLMs). It provides synthetically generated French text with basic vocabulary, allowing AI researchers and educators to develop and test models focused on foundational French language understanding and generation. Users get access to both the raw story data and pre-trained models.

No commits in the last 6 months.

Use this if you are an AI researcher or educator developing or evaluating small language models specifically for French, especially for educational or experimental purposes with simplified vocabulary.

Not ideal if you need a dataset for advanced, nuanced, or colloquial French language models, or if your application requires a different type of content beyond simple children's stories.

French-language-learning AI-model-training natural-language-generation educational-AI text-data-generation
Stale 6m No Package No Dependents
Maintenance 2 / 25
Adoption 5 / 25
Maturity 15 / 25
Community 13 / 25

How are scores calculated?

Stars

10

Forks

2

Language

Python

License

MIT

Last pushed

Aug 08, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/MaxLSB/le-carnet"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.