aws-samples/sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

/ 100

Emerging

This project helps businesses automate the extraction of specific data from multi-page documents like invoices or contracts. You can feed it PDFs or images, and it will output the key information you need in a structured JSON format, ready for your systems. It's designed for data analysts, operations managers, or IT professionals who deal with large volumes of varied documents.

No commits in the last 6 months.

Use this if you need to reliably convert diverse, multi-page documents into structured data for automated processing, especially if your documents have high variation.

Not ideal if you only need simple, generic text extraction or if your documents are already highly structured and machine-readable.

document-processing invoice-automation data-extraction business-operations information-management

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Jupyter Notebook

License

MIT-0

Higher-rated alternatives

dorarad/gansformer

Generative Adversarial Transformers

j-min/VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

invictus717/MetaTransformer

Meta-Transformer for Unified Multimodal Learning

rkansal47/MPGAN

The message passing GAN https://arxiv.org/abs/2106.11535 and generative adversarial particle...

Yachay-AI/byt5-geotagging

Confidence and Byt5 - based geotagging model predicting coordinates from text alone.

Explore Transformer Models

All categories Trending Transformer directory Insights