proycon/colibri-core

Colibri core is an NLP tool as well as a C++ and Python library for working with basic linguistic constructions such as n-grams and skipgrams (i.e patterns with one or more gaps, either of fixed or dynamic size) in a quick and memory-efficient way. At the core is the tool ``colibri-patternmodeller`` whi ch allows you to build, view, manipulate and query pattern models.

53
/ 100
Established

This tool helps linguists and language researchers efficiently analyze large collections of text (corpora) to find common word patterns. You provide a text corpus, and it generates models of n-grams, skipgrams, and flexgrams, along with their frequencies and relationships. This is ideal for computational linguists, sociolinguists, or anyone performing detailed corpus analysis.

129 stars.

Use this if you need to quickly identify and count recurring word sequences or patterns with gaps in very large text datasets without running out of memory.

Not ideal if you're only working with small text files or need advanced semantic understanding beyond pattern extraction and frequency counting.

computational-linguistics corpus-analysis text-mining natural-language-processing linguistic-research
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

129

Forks

20

Language

C++

License

GPL-3.0

Last pushed

Feb 05, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/nlp/proycon/colibri-core"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.