MaxLSB/le-carnet

LeCarnet is a 2 M+ corpus of simple French stories

/ 100

Emerging

LeCarnet is a collection of over 2 million simple French children's stories designed for training and evaluating small language models (SLMs). It provides synthetically generated French text with basic vocabulary, allowing AI researchers and educators to develop and test models focused on foundational French language understanding and generation. Users get access to both the raw story data and pre-trained models.

No commits in the last 6 months.

Use this if you are an AI researcher or educator developing or evaluating small language models specifically for French, especially for educational or experimental purposes with simplified vocabulary.

Not ideal if you need a dataset for advanced, nuanced, or colloquial French language models, or if your application requires a different type of content beyond simple children's stories.

French-language-learning AI-model-training natural-language-generation educational-AI text-data-generation

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 5 / 25

Maturity 15 / 25

Community 13 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

chakki-works/seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

Hironsan/anago

Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

jbesomi/texthero

Text preprocessing, representation and visualization from zero to hero.

hamelsmu/ktext

Utilities for preprocessing text for deep learning with Keras

asahi417/tner

Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An...

Explore NLP Tools

All categories Trending NLP directory Insights