MaxLSB/le-carnet
LeCarnet is a 2 M+ corpus of simple French stories
LeCarnet is a collection of over 2 million simple French children's stories designed for training and evaluating small language models (SLMs). It provides synthetically generated French text with basic vocabulary, allowing AI researchers and educators to develop and test models focused on foundational French language understanding and generation. Users get access to both the raw story data and pre-trained models.
No commits in the last 6 months.
Use this if you are an AI researcher or educator developing or evaluating small language models specifically for French, especially for educational or experimental purposes with simplified vocabulary.
Not ideal if you need a dataset for advanced, nuanced, or colloquial French language models, or if your application requires a different type of content beyond simple children's stories.
Stars
10
Forks
2
Language
Python
License
MIT
Category
Last pushed
Aug 08, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/MaxLSB/le-carnet"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
chakki-works/seqeval
A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
Hironsan/anago
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
jbesomi/texthero
Text preprocessing, representation and visualization from zero to hero.
hamelsmu/ktext
Utilities for preprocessing text for deep learning with Keras
asahi417/tner
Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An...