rragundez/chunkdot

Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K most similar items for a large number of items by chunking the item matrix representation (embeddings) and using Numba to accelerate the calculations.

/ 100

Emerging

This tool helps data scientists and machine learning engineers efficiently find the most similar items within very large datasets. You input item representations, often called 'embeddings,' which can be either dense or sparse, and it outputs a list of the top K most similar (or dissimilar) items for each item in your dataset. This is particularly useful for tasks like recommendation systems or information retrieval.

No commits in the last 6 months. Available on PyPI.

Use this if you need to calculate the top K most similar items for a large number of items and want to do so quickly and memory-efficiently, even with datasets containing hundreds of thousands or millions of items.

Not ideal if your dataset is small or if you need to calculate exact similarity scores for every single pair of items rather than just the top K.

similarity-search recommendation-systems information-retrieval large-scale-data-analysis machine-learning-engineering

Stale 6m

Maintenance 0 / 25

Adoption 9 / 25

Maturity 25 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

Azure/azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.

curiosity-ai/catalyst

🚀 Catalyst is a C# Natural Language Processing library built for speed. Inspired by spaCy's...

supabase/embeddings-generator

GitHub Action to generate embeddings from the markdown files in your repository.

vector-ai/vectorai

Vector AI — A platform for building vector based applications. Encode, query and analyse data...

wagtail/wagtail-vector-index

Store Wagtail pages & Django models as embeddings in vector databases

Explore Embedding Tools

All categories Trending Embeddings directory Insights