ashvardanian/JaccardIndex
Optimizing bit-level Jaccard Index and Population Counts for large-scale quantized Vector Search via Harley-Seal CSA and Lookup Tables
This project helps data scientists and machine learning engineers significantly speed up the calculation of Jaccard similarity between very large collections of binary data vectors. It takes in collections of these binary vectors and outputs their similarity scores much faster than standard methods. This is particularly useful for those working with large-scale vector search and information retrieval systems.
No commits in the last 6 months.
Use this if you need to calculate Jaccard Index or population counts efficiently on large-scale binary vectors, especially in vector search applications.
Not ideal if your similarity calculations don't involve binary, bit-level data or if you're dealing with very small datasets where performance is not a critical concern.
Stars
21
Forks
2
Language
Python
License
Apache-2.0
Category
Last pushed
May 18, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/vector-db/ashvardanian/JaccardIndex"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
MariaDB/server
MariaDB server is a community developed fork of MySQL server. Started by core members of the...
AlayaDB-AI/AlayaLite
AlayaLite – A Fast, Flexible Vector Database for Everyone.
infiniflow/infinity
The AI-native database built for LLM applications, providing incredibly fast hybrid search of...
nnethercott/hannoy
Production-ready KV-backed HNSW implementation in Rust using LMDB
dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL...