aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai
This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.
This project helps businesses automate the extraction of specific data from multi-page documents like invoices or contracts. You can feed it PDFs or images, and it will output the key information you need in a structured JSON format, ready for your systems. It's designed for data analysts, operations managers, or IT professionals who deal with large volumes of varied documents.
No commits in the last 6 months.
Use this if you need to reliably convert diverse, multi-page documents into structured data for automated processing, especially if your documents have high variation.
Not ideal if you only need simple, generic text extraction or if your documents are already highly structured and machine-readable.
Stars
15
Forks
2
Language
Jupyter Notebook
License
MIT-0
Category
Last pushed
Aug 04, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
dorarad/gansformer
Generative Adversarial Transformers
j-min/VL-T5
PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning
rkansal47/MPGAN
The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...
Yachay-AI/byt5-geotagging
Confidence and Byt5 - based geotagging model predicting coordinates from text alone.