Haiyang-W/TokenFormer
[ICLR2025 Spotlightš„] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
This project offers a novel way to build large-scale AI models, particularly useful for tasks like understanding and generating human language or classifying images. It takes raw text or images as input and processes them using a flexible 'attention' mechanism that interacts with both the data and the model's internal knowledge. The output is a highly capable AI model that can perform various complex tasks. This is for AI researchers and engineers who develop and train foundation models.
588 stars. No commits in the last 6 months.
Use this if you are developing large AI models and need a highly flexible and scalable architecture that can be incrementally improved without retraining from scratch.
Not ideal if you are looking for an off-the-shelf solution for a specific application without delving into core AI model architecture.
Stars
588
Forks
43
Language
Python
License
Apache-2.0
Category
Last pushed
Feb 11, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/Haiyang-W/TokenFormer"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
DaoD/INTERS
This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in...
declare-lab/instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca...
hkust-nlp/deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
kehanlu/DeSTA2
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model...
TIGER-AI-Lab/VisualWebInstruct
The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web...