EmilStenstrom/conllu

A CoNLL-U parser that takes a CoNLL-U formatted string and turns it into a nested python dictionary.

/ 100

Verified

This tool helps natural language processing practitioners convert CoNLL-U formatted text, which often comes from NLP tasks, into a structured Python dictionary. You input the CoNLL-U text, and it outputs a list of sentences, where each sentence is a list of tokens with their linguistic annotations readily accessible. It is designed for those working with linguistic data who need to programmatically access and manipulate parsed text.

320 stars. Used by 9 other packages. Available on PyPI.

Use this if you are a developer working with linguistic data in Python and need to easily read, manipulate, or write CoNLL-U formatted text within your applications.

Not ideal if you are looking for a tool to perform natural language processing tasks itself, such as part-of-speech tagging or dependency parsing, rather than just parsing the output of such tasks.

natural-language-processing computational-linguistics data-parsing linguistic-annotation text-analysis

Maintenance 13 / 25

Adoption 15 / 25

Maturity 25 / 25

Community 20 / 25

How are scores calculated?

Stars

320

Forks

Language

Python

License

MIT

Related tools

OpenPecha/Botok

🏷 བོད་ཏོག [pʰøtɔk̚] Tibetan word tokenizer in Python

taishi-i/nagisa

A Japanese tokenizer based on recurrent neural networks

zaemyung/sentsplit

A flexible sentence segmentation library using CRF model and regex rules

natasha/razdel

Rule-based token, sentence segmentation for Russian language

polm/cutlet

Japanese to romaji converter in Python

Explore NLP Tools

All categories Trending NLP directory Insights