JosefAlbers/VL-JEPA
VL-JEPA (Vision-Language Joint Embedding Predictive Architecture) in MLX
This project helps machine learning researchers and practitioners understand and experiment with a specific type of AI model called Vision-Language Joint Embedding Predictive Architecture (VL-JEPA). It takes existing Vision-Language Models (like PaliGemma) and reframes them into a JEPA structure, which can lead to more efficient and robust learning. The output is a working example of this architecture, allowing for deeper insight into its mechanics and potential.
Use this if you are an AI researcher or machine learning engineer looking to explore advanced self-supervised learning architectures, specifically VL-JEPA, using the Apple MLX framework.
Not ideal if you are looking for a plug-and-play solution for general image or text analysis, or if you are not familiar with machine learning model architectures and frameworks.
Stars
76
Forks
6
Language
Python
License
Apache-2.0
Category
Last pushed
Dec 31, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/JosefAlbers/VL-JEPA"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
KimMeen/Time-LLM
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming...
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
bytedance/SALMONN
SALMONN family: A suite of advanced multi-modal LLMs
NVlabs/OmniVinci
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
fixie-ai/ultravox
A fast multimodal LLM for real-time voice