showlab/VisInContext

Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

24
/ 100
Experimental

This tool helps researchers and AI practitioners enhance their multi-modal AI models by significantly extending the amount of text context these models can process. It takes existing multi-modal models and datasets, then integrates visual tokens to effectively expand the textual input capacity. The result is a model capable of understanding and generating responses based on much longer text inputs, which is particularly useful for those working with large language models combined with images.

No commits in the last 6 months.

Use this if you are building or evaluating multi-modal AI models and frequently encounter limitations due to short text context windows.

Not ideal if your primary goal is to improve image generation quality rather than extending textual understanding within multi-modal models.

multi-modal AI large language models AI model training context window extension AI research
No License Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 8 / 25
Community 9 / 25

How are scores calculated?

Stars

28

Forks

3

Language

Python

License

Last pushed

Oct 30, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/showlab/VisInContext"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.