ziqipang/LM4VisualEncoding

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

35
/ 100
Emerging

This project offers a novel way for machine learning researchers and practitioners to improve the performance of visual AI models. It uses pre-trained language model components to enhance how visual data, like images or video frames, are understood. By integrating these 'frozen' language model parts into existing visual encoders, it helps AI models better identify and focus on important visual features, leading to more accurate classification of images, point clouds, and actions.

246 stars. No commits in the last 6 months.

Use this if you are a machine learning researcher or engineer working on visual recognition tasks and want to explore innovative methods to boost model accuracy by leveraging language model capabilities.

Not ideal if you are looking for a plug-and-play solution for non-visual tasks or if you are not comfortable with modifying existing deep learning architectures.

image-classification video-analysis 3d-data-processing deep-learning-research computer-vision
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 9 / 25

How are scores calculated?

Stars

246

Forks

8

Language

Python

License

MIT

Last pushed

Jan 17, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ziqipang/LM4VisualEncoding"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.