JosefAlbers/VL-JEPA

VL-JEPA (Vision-Language Joint Embedding Predictive Architecture) in MLX

/ 100

Emerging

This project helps machine learning researchers and practitioners understand and experiment with a specific type of AI model called Vision-Language Joint Embedding Predictive Architecture (VL-JEPA). It takes existing Vision-Language Models (like PaliGemma) and reframes them into a JEPA structure, which can lead to more efficient and robust learning. The output is a working example of this architecture, allowing for deeper insight into its mechanics and potential.

Use this if you are an AI researcher or machine learning engineer looking to explore advanced self-supervised learning architectures, specifically VL-JEPA, using the Apple MLX framework.

Not ideal if you are looking for a plug-and-play solution for general image or text analysis, or if you are not familiar with machine learning model architectures and frameworks.

AI-research machine-learning-engineering self-supervised-learning vision-language-models neural-network-architectures

No Package No Dependents

Maintenance 6 / 25

Adoption 9 / 25

Maturity 13 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

Apache-2.0

Higher-rated alternatives

KimMeen/Time-LLM

[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...

om-ai-lab/VLM-R1

Solve Visual Understanding with Reinforced VLMs

bytedance/SALMONN

SALMONN family: A suite of advanced multi-modal LLMs

NVlabs/OmniVinci

OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.

fixie-ai/ultravox

A fast multimodal LLM for real-time voice

Explore Transformer Models

All categories Trending Transformer directory Insights