roboflow/maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

/ 100

Established

This tool helps machine learning engineers and researchers quickly customize advanced vision-language models like Florence-2 or PaliGemma 2 for specific tasks. You provide your specialized image and text datasets, and it streamlines the entire process from data loading to training, outputting a fine-tuned model ready for deployment. This is for technical users familiar with AI model training concepts.

2,661 stars.

Use this if you need to adapt powerful existing multimodal AI models to perform new, specialized tasks like custom object detection or extracting specific information from images paired with text.

Not ideal if you are looking for a no-code solution or want to build a multimodal model entirely from scratch rather than fine-tuning an existing one.

Machine Learning Engineering Computer Vision Natural Language Processing Model Fine-tuning AI Development

No Package No Dependents

Maintenance 10 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 19 / 25

How are scores calculated?

Stars

2,661

Forks

221

Language

Python

License

Apache-2.0

Related models

unslothai/unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama,...

huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

modelscope/ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5,...

oumi-ai/oumi

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

linkedin/Liger-Kernel

Efficient Triton Kernels for LLM Training

Explore Transformer Models

All categories Trending Transformer directory Insights