EmilStenstrom/conllu
A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.
This tool helps natural language processing practitioners convert CoNLL-U formatted text, which often comes from NLP tasks, into a structured Python dictionary. You input the CoNLL-U text, and it outputs a list of sentences, where each sentence is a list of tokens with their linguistic annotations readily accessible. It is designed for those working with linguistic data who need to programmatically access and manipulate parsed text.
320 stars. Used by 9 other packages. Available on PyPI.
Use this if you are a developer working with linguistic data in Python and need to easily read, manipulate, or write CoNLL-U formatted text within your applications.
Not ideal if you are looking for a tool to perform natural language processing tasks itself, such as part-of-speech tagging or dependency parsing, rather than just parsing the output of such tasks.
Stars
320
Forks
53
Language
Python
License
MIT
Category
Last pushed
Mar 15, 2026
Commits (30d)
0
Reverse dependents
9
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/EmilStenstrom/conllu"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
OpenPecha/Botok
🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python
taishi-i/nagisa
A Japanese tokenizer based on recurrent neural networks
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
natasha/razdel
Rule-based token, sentence segmentation for Russian language
polm/cutlet
Japanese to romaji converter in Python