sinanuozdemir/oreilly-multimodal-ai
Learn how multimodal AI merges text, image, and audio for smarter models
This project helps machine learning practitioners build smarter AI models by combining different types of data like text, images, and audio. It provides practical examples and code for tasks such as transcribing speech, answering questions about images, generating images from text, and fine-tuning text-to-speech models. Data scientists and AI engineers who want to integrate multiple data modalities into their applications would find this useful.
No commits in the last 6 months.
Use this if you are a data scientist or AI engineer with intermediate Python and foundational machine learning knowledge looking to develop and experiment with multimodal AI applications.
Not ideal if you are looking for a ready-to-use application and do not have programming or machine learning experience.
Stars
30
Forks
14
Language
Jupyter Notebook
License
—
Category
Last pushed
Jan 21, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/generative-ai/sinanuozdemir/oreilly-multimodal-ai"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
ankanbhunia/Handwriting-Transformers
Handwriting-Transformers (ICCV21)
immex-tech/decor8ai-sdk
Decor8 AI SDK for AI Interior Design And Virtual Property Staging
nv-tlabs/ATISS
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021
alaradirik/sd-interior-design
Layout preserving realistic interior design using text and image prompts
rudyoactiv/typescribe-handwriting
⚡ Create handwritten documents from text with a Neural Network!