m2b3/CanViT-PyTorch
Reference implementation of the Canvas Vision Transformer from the paper "CanViT: Toward Active-Vision Foundation Models"
This project offers an advanced computer vision model for analyzing images by focusing on specific areas over time, similar to how humans inspect a scene. It takes in an image and a sequence of 'glimpses' (specific zoomed-in regions) and outputs a detailed, evolving understanding of the entire scene, along with classifications. This tool is ideal for researchers and practitioners building systems that need to interpret complex visual environments by actively exploring them.
Use this if you need a flexible vision model that can process visual information in a sequence of localized observations, building up a comprehensive understanding of a scene over time, even with high-resolution imagery.
Not ideal if you primarily work with single, static images for basic, whole-image classification without needing sequential, fine-grained analysis.
Stars
13
Forks
1
Language
Python
License
—
Category
Last pushed
Mar 25, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/computer-vision/m2b3/CanViT-PyTorch"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
lucidrains/vit-pytorch
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with...
roflcoopter/viseron
Self-hosted, local only NVR and AI Computer Vision software. With features such as object...
blakeblackshear/frigate
NVR with realtime local object detection for IP cameras
levan92/deep_sort_realtime
A really more real-time adaptation of deep sort
notAI-tech/NudeNet
Lightweight nudity detection