nert-nlp/streusle
STREUSLE: a corpus with comprehensive lexical semantic annotation (multiword expressions, supersenses)
This project provides a meticulously annotated English text corpus, taking raw web review text and enriching it with detailed information about multiword expressions (like 'kick the bucket') and semantic supersenses for nouns, verbs, and prepositions. The output is a structured dataset (in CoNLL-U or JSON) that reveals the grammatical status and meaning categories of individual words and phrases. This is designed for computational linguists and natural language processing researchers who need high-quality, fine-grained semantic data for training models or linguistic analysis.
Available on PyPI.
Use this if you need a richly annotated English text dataset to understand the precise grammatical and semantic roles of words and multiword expressions, especially for research in lexical semantics or building advanced NLP systems.
Not ideal if you're looking for a simple, general-purpose text corpus without deep lexical semantic annotations, or if your primary interest is in high-level sentiment analysis rather than detailed linguistic structure.
Stars
72
Forks
19
Language
Python
License
CC-BY-SA-4.0
Category
Last pushed
Nov 15, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/nert-nlp/streusle"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
bretttolbert/verbecc
Verbe Complete Conjugator (verbecc) supports Catalan, Spanish, French, Italian, Portuguese and...
natasha/yargy
Rule-based facts extraction for Russian language
bjascob/LemmInflect
A python module for English lemmatization and inflection.
google-research/turkish-morphology
A two-level morphological analyzer for Turkish.
Ars-Linguistica/mlconjug3
A Python library to conjugate verbs in French, English, Spanish, Italian, Portuguese and...