katanaml/sparrow

Structured data extraction and instruction calling with ML, LLM and Vision LLM

63
/ 100
Established

This tool helps businesses and individuals convert various documents like invoices, receipts, bank statements, and forms into organized, structured data. You input an image or PDF document, and it outputs the extracted information in a clean, queryable JSON format. It's designed for anyone who regularly deals with processing physical or digital documents and needs to quickly pull out specific pieces of information.

5,129 stars. Actively maintained with 15 commits in the last 30 days.

Use this if you need to automate the process of extracting specific data points from a high volume of diverse documents like financial statements or forms into a structured, digital format.

Not ideal if you only occasionally process a few simple documents by hand, or if your primary need is general text summarization rather than precise data extraction.

document-processing data-extraction financial-operations record-keeping information-capture
No Package No Dependents
Maintenance 17 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 20 / 25

How are scores calculated?

Stars

5,129

Forks

511

Language

Python

License

GPL-3.0

Last pushed

Mar 12, 2026

Commits (30d)

15

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/rag/katanaml/sparrow"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.