FoundationVision/VAR
[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
This project introduces Visual Autoregressive Modeling (VAR), a new technique that generates images by predicting the next scale, or resolution, of an image, similar to how large language models predict the next word. It allows users to create high-quality images from scratch using text prompts or other inputs. This tool is ideal for researchers and practitioners in fields like generative AI, computer vision, and digital media who need state-of-the-art image synthesis capabilities.
8,641 stars.
Use this if you need to generate high-quality images from text or other visual data and are looking for a method that excels in detail and scalability.
Not ideal if your primary goal is simple image editing or enhancement rather than generating entirely new visual content.
Stars
8,641
Forks
563
Language
Jupyter Notebook
License
MIT
Category
Last pushed
Nov 10, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/diffusion/FoundationVision/VAR"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Related models
NVlabs/Sana
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
nerdyrodent/VQGAN-CLIP
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
huggingface/finetrainers
Scalable and memory-optimized training of diffusion models
AssemblyAI-Community/MinImagen
MinImagen: A minimal implementation of the Imagen text-to-image model
eps696/aphantasia
CLIP + FFT/DWT/RGB = text to image/video