gandersen101/spaczz
Fuzzy matching and more functionality for spaCy.
This tool helps developers working with natural language processsing to identify specific words or phrases in text, even if there are slight misspellings or variations. It takes raw text as input and uses predefined patterns to find and extract matching phrases, along with a score indicating how closely they match. It's designed for Python developers who build applications that process and understand human language.
258 stars. No commits in the last 6 months. Available on PyPI.
Use this if you need to reliably find and extract text patterns in documents where perfect spelling or exact phrasing cannot be guaranteed, such as user-generated content or scanned documents.
Not ideal if you require extremely high performance for very large datasets, as the fuzzy matching process can be computationally intensive compared to exact string matching.
Stars
258
Forks
30
Language
Python
License
MIT
Category
Last pushed
Jul 06, 2024
Commits (30d)
0
Dependencies
7
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/nlp/gandersen101/spaczz"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related tools
nltk/nltk
NLTK Source
explosion/spaCy
💫 Industrial-strength Natural Language Processing (NLP) in Python
undertheseanlp/underthesea
Underthesea - Vietnamese NLP Toolkit
stanfordnlp/stanza
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many...
flairNLP/flair
A very simple framework for state-of-the-art Natural Language Processing (NLP)