Jaykef/min-patchnizer

Minimal, clean code for video/image "patchnization" - a process commonly used in tokenizing visual data for use in a Transformer encoder.

/ 100

Experimental

This project helps machine learning engineers preprocess video and image data for computer vision tasks. It takes raw video files or images and transforms them into a sequence of numerically embedded 'patches,' which are then ready to be fed into a Vision Transformer model for analysis or training. This tool is designed for practitioners working with advanced deep learning models for visual understanding.

No commits in the last 6 months.

Use this if you need to convert video or image data into a tokenized, sequence-based format suitable for Vision Transformer encoders.

Not ideal if you are looking for a general-purpose image processing library or a solution that doesn't specifically target Vision Transformer inputs.

computer-vision deep-learning video-processing image-tokenization machine-learning-engineering

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 7 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

yaserkl/RLSeq2Seq

Deep Reinforcement Learning For Sequence to Sequence Models

kefirski/pytorch_RVAE

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

georgian-io/Multimodal-Toolkit

Multimodal model for text and tabular data with HuggingFace transformers as building block for text data

ctr4si/A-Hierarchical-Latent-Structure-for-Variational-Conversation-Modeling

PyTorch Implementation of "A Hierarchical Latent Structure for Variational Conversation...

nurpeiis/LeakGAN-PyTorch

A simple implementation of LeakGAN in PyTorch

Explore NLP Tools

All categories Trending NLP directory Insights