anaistack/cefr-asag-corpus

A corpus of short answers written by learners of English and graded with CEFR levels

21
/ 100
Experimental

This dataset provides short English answers from non-native speakers, each linked to a specific language proficiency level defined by the Common European Framework of Reference for Languages (CEFR). Some answers also include CEFR levels assigned by certified examiners. It's designed for researchers, language educators, and computational linguists studying second language acquisition and automated assessment.

No commits in the last 6 months.

Use this if you are developing or evaluating systems for automatically grading English proficiency from short written responses, or for research into language learner errors at different CEFR levels.

Not ideal if you need a corpus of long-form essays or spoken language, or if you require proficiency grading outside of the CEFR framework.

language-assessment english-learning cefr-grading educational-technology applied-linguistics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 5 / 25
Maturity 16 / 25
Community 0 / 25

How are scores calculated?

Stars

12

Forks

Language

License

Last pushed

Dec 17, 2021

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/anaistack/cefr-asag-corpus"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.