ashvardanian/JaccardIndex

Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables

/ 100

Emerging

This project helps data scientists and machine learning engineers significantly speed up the calculation of Jaccard similarity between very large collections of binary data vectors. It takes in collections of these binary vectors and outputs their similarity scores much faster than standard methods. This is particularly useful for those working with large-scale vector search and information retrieval systems.

No commits in the last 6 months.

Use this if you need to calculate Jaccard Index or population counts efficiently on large-scale binary vectors, especially in vector search applications.

Not ideal if your similarity calculations don't involve binary, bit-level data or if you're dealing with very small datasets where performance is not a critical concern.

information-retrieval vector-search data-quantization large-scale-data similarity-scoring

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

MariaDB/server

MariaDB server is a community developed fork of MySQL server. Started by core members of the...

AlayaDB-AI/AlayaLite

AlayaLite – A Fast, Flexible Vector Database for Everyone.

infiniflow/infinity

The AI-native database built for LLM applications, providing incredibly fast hybrid search of...

nnethercott/hannoy

Production-ready KV-backed HNSW implementation in Rust using LMDB

dingodb/dingo

A multi-modal vector database that supports upserts and vector queries using unified SQL...

Explore Vector Databases

All categories Trending Vector Database directory Insights