FoundationVision/VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

/ 100

Established

This project introduces Visual Autoregressive Modeling (VAR), a new technique that generates images by predicting the next scale, or resolution, of an image, similar to how large language models predict the next word. It allows users to create high-quality images from scratch using text prompts or other inputs. This tool is ideal for researchers and practitioners in fields like generative AI, computer vision, and digital media who need state-of-the-art image synthesis capabilities.

8,641 stars.

Use this if you need to generate high-quality images from text or other visual data and are looking for a method that excels in detail and scalability.

Not ideal if your primary goal is simple image editing or enhancement rather than generating entirely new visual content.

generative-AI image-synthesis computer-vision digital-content-creation AI-research

No Package No Dependents

Maintenance 6 / 25

Adoption 10 / 25

Maturity 16 / 25

Community 18 / 25

How are scores calculated?

Stars

8,641

Forks

563

Language

Jupyter Notebook

License

MIT

Related models

NVlabs/Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

nerdyrodent/VQGAN-CLIP

Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.

huggingface/finetrainers

Scalable and memory-optimized training of diffusion models

AssemblyAI-Community/MinImagen

MinImagen: A minimal implementation of the Imagen text-to-image model

eps696/aphantasia

CLIP + FFT/DWT/RGB = text to image/video

Explore Diffusion Models

All categories Trending Diffusion directory Insights