sorcero/ingestum
Read-only mirror of https://gitlab.com/sorcero/community/ingestum
When you need to analyze or compare information from many different places like PDFs, HTML pages, images, or even social media feeds, this tool helps you get all that varied content into a clean, uniform text format. It takes diverse source materials and outputs standardized text documents, ready for tasks like document comparison, search, or automated tagging. This is for anyone who works with information from many different sources and needs to process it consistently.
No commits in the last 6 months.
Use this if you regularly work with content from various formats (like PDFs, HTML, or even audio files) and need to convert them into plain, searchable text for analysis or further processing.
Not ideal if you only work with already-clean text documents and don't need to extract or transform content from diverse file types.
Stars
7
Forks
—
Language
Python
License
LGPL-3.0
Category
Last pushed
Jan 23, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/sorcero/ingestum"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
clusterzx/paperless-ai
An automated document analyzer for Paperless-ngx using OpenAI API, Ollama, Deepseek-r1, Azure...
kha-white/manga-ocr
Optical character recognition for Japanese text, with the main focus being Japanese manga
alephpi/Texo-web
The web application for Texo, a minimalist SOTA LaTeX OCR model which contains only 20M...
bytefer/ollama-ocr
Implementing OCR with a local visual model run by ollama.
alephpi/Texo
A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training...