ocramz/ncd-tree
text similarity search trees based on Normalized Compression Distance
This is a Haskell library for developers who need to find how similar different pieces of text or data sequences are. It takes a collection of documents or data and builds an index, then allows you to query that index to quickly find the most similar items to a given input. This is ideal for developers building applications that require comparing data based on its underlying structure, without needing to understand the content itself.
Use this if you are a Haskell developer building an application that needs to quickly find similar text snippets, code fragments, or data sequences without extensive feature engineering.
Not ideal if you are not a Haskell developer or if your application requires a precise, exhaustive search rather than an approximate one.
Stars
10
Forks
1
Language
Haskell
License
BSD-3-Clause
Category
Last pushed
Dec 24, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/ocramz/ncd-tree"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
shibing624/similarity
similarity: Text similarity calculation Toolkit for Java. 文本相似度计算工具包,java编写,可用于文本相似度计算、情感分析等任务,开箱即用。
eBay/Sequence-Semantic-Embedding
Tools and recipes to train deep learning models and build services for NLP tasks such as text...
RandolphVI/Text-Pairs-Relation-Classification
About Text Pairs (Sentence Level) Classification (Similarity Modeling) Based on Neural Network.
MartinoMensio/spacy-universal-sentence-encoder
Google USE (Universal Sentence Encoder) for spaCy
piotrmaciejbednarski/text-similarity-node
High-performance and memory efficient native C++ text similarity algorithms for Node.js