MaartenGr/PolyFuzz
Fuzzy string matching, grouping, and evaluation.
PolyFuzz helps you clean up messy text data by finding and grouping similar words or phrases, even if they're spelled slightly differently. You give it lists of text, and it identifies potential matches and clusters them, so you can standardize inconsistent entries like 'apple' and 'apples'. This tool is for anyone managing large text datasets who needs to ensure consistency and accuracy, such as data analysts, researchers, or marketers.
792 stars. Used by 2 other packages. No commits in the last 6 months. Available on PyPI.
Use this if you need to quickly find and group variations of text entries in your data, like product names, customer feedback, or scientific terms, to improve data quality and consistency.
Not ideal if your primary goal is exact string matching or if you need to process massive datasets in real-time without prior model fitting.
Stars
792
Forks
71
Language
Python
License
MIT
Category
Last pushed
Jul 10, 2025
Commits (30d)
0
Dependencies
9
Reverse dependents
2
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/embeddings/MaartenGr/PolyFuzz"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
deepset-ai/haystack-tutorials
Here you can find all the Tutorials for Haystack 📓
aryn-ai/sycamore
🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.
unum-cloud/USearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C,...
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
pingcap/pytidb
TiDB AI SDK: Unified Multi-Modal Data Platform for AI Apps & Agents - https://pingcap.github.io/ai/