messiaen/full-lattice-search
Full Text Search Over Probabilistic Lattices with Elasticsearch!
This tool helps you search through large collections of audio transcripts, scanned documents, or machine translations that might contain errors or alternative interpretations. It takes probabilistic 'lattices' (like those from an ASR system or OCR), which represent multiple possible words or phrases at each point, and lets you search them. The output is highly relevant search results, even when the original transcription is uncertain. It's designed for data analysts, linguists, or operations teams working with imperfect data from automated processing.
No commits in the last 6 months.
Use this if you need to perform accurate full-text searches across vast amounts of automatically generated text, where each word or phrase might have multiple probabilistic alternatives.
Not ideal if your text data is already perfectly accurate and unambiguous, as the added complexity of lattice search won't provide significant benefits.
Stars
10
Forks
2
Language
Java
License
Apache-2.0
Category
Last pushed
Nov 20, 2020
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/voice-ai/messiaen/full-lattice-search"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
githubharald/CTCDecoder
Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon...
githubharald/CTCWordBeamSearch
Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
nl8590687/ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
athena-team/athena
an open-source implementation of sequence-to-sequence based speech processing engine
hirofumi0810/tensorflow_end2end_speech_recognition
End-to-End speech recognition implementation base on TensorFlow (CTC, Attention, and MTL training)