waltonfuture/InstructionGPT-4

InstructionGPT-4

/ 100

Emerging

This project helps AI researchers and practitioners refine large language models that understand both images and text. It takes a large, general dataset of image-text instructions and processes it to identify the most impactful, high-quality examples. The output is a smaller, highly effective dataset that can be used to fine-tune models like MiniGPT-4, leading to better performance with less data.

No commits in the last 6 months.

Use this if you are a machine learning engineer or AI researcher looking to efficiently improve the performance of multimodal large language models by curating high-quality training data.

Not ideal if you are looking for a pre-trained, ready-to-use chatbot or image analysis tool, as this project focuses on the data preparation step for model training.

AI model training Multimodal AI Large language models Data curation Machine learning research

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 8 / 25

Maturity 16 / 25

Community 8 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

AI-Hypercomputer/maxtext

A simple, performant and scalable Jax LLM!

rasbt/reasoning-from-scratch

Implement a reasoning LLM in PyTorch from scratch, step by step

mindspore-lab/mindnlp

MindSpore + 🤗Huggingface: Run any Transformers/Diffusers model on MindSpore with seamless...

mosaicml/llm-foundry

LLM training code for Databricks foundation models

rickiepark/llm-from-scratch

<밑바닥부터 만들면서 공부하는 LLM>(길벗, 2025)의 코드 저장소

Explore Transformer Models

All categories Trending Transformer directory Insights