xinyanghuang7/Basic-Visual-Language-Model

Build a simple basic multimodal large model from scratch. 从零搭建一个简单的基础多模态大模型🤖

33
/ 100
Emerging

This project allows AI researchers and machine learning engineers to build a custom multimodal large language model from scratch. You provide image datasets (like COCO or AI Challenger) and corresponding textual annotations, and the project outputs a trained model capable of understanding and generating responses about images. This is for professionals who want to develop new vision-language AI capabilities for specific applications.

No commits in the last 6 months.

Use this if you need to train your own vision-language model with specialized datasets to achieve domain-specific visual comprehension and dialogue capabilities.

Not ideal if you're looking for an off-the-shelf tool to simply use a multimodal model without any training or model architecture modifications.

AI research machine learning engineering multimodal AI computer vision natural language processing
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 8 / 25
Maturity 8 / 25
Community 17 / 25

How are scores calculated?

Stars

47

Forks

9

Language

Python

License

Last pushed

Jun 19, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/xinyanghuang7/Basic-Visual-Language-Model"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.