sorcero/ingestum

Read-only mirror of https://gitlab.com/sorcero/community/ingestum

/ 100

Experimental

When you need to analyze or compare information from many different places like PDFs, HTML pages, images, or even social media feeds, this tool helps you get all that varied content into a clean, uniform text format. It takes diverse source materials and outputs standardized text documents, ready for tasks like document comparison, search, or automated tagging. This is for anyone who works with information from many different sources and needs to process it consistently.

No commits in the last 6 months.

Use this if you regularly work with content from various formats (like PDFs, HTML, or even audio files) and need to convert them into plain, searchable text for analysis or further processing.

Not ideal if you only work with already-clean text documents and don't need to extract or transform content from diverse file types.

content-extraction document-processing information-retrieval data-preparation text-normalization

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

LGPL-3.0

Higher-rated alternatives

clusterzx/paperless-ai

An automated document analyzer for Paperless-ngx using OpenAI API, Ollama, Deepseek-r1, Azure...

kha-white/manga-ocr

Optical character recognition for Japanese text, with the main focus being Japanese manga

alephpi/Texo-web

The web application for Texo, a minimalist SOTA LaTeX OCR model which contains only 20M...

bytefer/ollama-ocr

Implementing OCR with a local visual model run by ollama.

alephpi/Texo

A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training...

Explore Transformer Models

All categories Trending Transformer directory Insights