All Transformer Models
7,795 models ranked by quality score · Page 9 of 78
| # | Model | Score | Tier |
|---|---|---|---|
| 801 |
freshllms/freshqa
Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214) |
|
Emerging |
| 802 |
yuchenlin/LLM-Blender
[ACL2023] We introduce LLM-Blender, an innovative ensembling framework to... |
|
Emerging |
| 803 |
Victorwz/LongMem
Official implementation of our NeurIPS 2023 paper "Augmenting Language... |
|
Emerging |
| 804 |
XXO47OXX/layer-scan
Automated LLM layer duplication config scanner — find the optimal (i,j) for... |
|
Emerging |
| 805 |
SkalskiP/vlms-zero-to-hero
This series will take you on a journey from the fundamentals of NLP and... |
|
Emerging |
| 806 |
joyehuang/minimind-notes
🚀 [从零构建 LLM] 极简大模型训练原理与实践指南。包含 Transformer, Pretraining, SFT 核心代码与对照实验。 | A... |
|
Emerging |
| 807 |
mdrokz/rust-llama.cpp
LLama.cpp rust bindings |
|
Emerging |
| 808 |
ariya/ask-llm
Interact with any LLM service |
|
Emerging |
| 809 |
Kaushalya/medclip
A multi-modal CLIP model trained on the medical dataset ROCO |
|
Emerging |
| 810 |
sgrvinod/chess-transformers
Teaching transformers to play chess |
|
Emerging |
| 811 |
4AI/LS-LLaMA
A Simple but Powerful SOTA NER Model | Official Code For Label Supervised... |
|
Emerging |
| 812 |
ictnlp/Stream-Omni
Stream-Omni is a GPT-4o-like language-vision-speech chatbot that... |
|
Emerging |
| 813 |
EncrEor/rlm-claude
Recursive Language Models for Claude Code - Infinite memory solution... |
|
Emerging |
| 814 |
vijaydwivedi75/gnn-lspe
Source code for GNN-LSPE (Graph Neural Networks with Learnable Structural... |
|
Emerging |
| 815 |
flipkart-incubator/spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use... |
|
Emerging |
| 816 |
Arunprakash-A/DL-Pytorch-Workshop
Develop DL models using Pytorch and Hugging Face |
|
Emerging |
| 817 |
monologg/KoBERT-KorQuAD
Korean MRC (KorQuAD) with KoBERT |
|
Emerging |
| 818 |
alohays/awesome-visual-representation-learning-with-transformers
Awesome Transformers (self-attention) in Computer Vision |
|
Emerging |
| 819 |
1b5d/llm-api
Run any Large Language Model behind a unified API |
|
Emerging |
| 820 |
mbzuai-oryx/MobiLlama
[ICLR-2025-SLLM Spotlight 🔥]MobiLlama : Small Language Model tailored for... |
|
Emerging |
| 821 |
chengzeyi/ParaAttention
https://wavespeed.ai/ Context parallel attention that accelerates DiT model... |
|
Emerging |
| 822 |
amazon-science/tanl
Structured Prediction as Translation between Augmented Natural Languages |
|
Emerging |
| 823 |
GyanPrakashkushwaha/DataScience
EVERYTHING YOU NEED FOR DATA SCIENCE. |
|
Emerging |
| 824 |
amanvirparhar/weebo
A real-time speech-to-speech chatbot powered by Whisper Small, Llama 3.2,... |
|
Emerging |
| 825 |
donaldafeith/Pytorch_Merge
Merge LLM that are split in to parts |
|
Emerging |
| 826 |
absadiki/pyllamacpp
Python bindings for llama.cpp |
|
Emerging |
| 827 |
xinzhanguo/hellollm
pre train a new llm |
|
Emerging |
| 828 |
snap-research/EfficientFormer
EfficientFormerV2 [ICCV 2023] & EfficientFormer [NeurIPs 2022] |
|
Emerging |
| 829 |
buaacyw/MeshAnythingV2
[ICCV 2025] From anything to mesh like human artists. Official impl. of... |
|
Emerging |
| 830 |
OPTML-Group/Unlearn-Simple
[NeurIPS25] Official repo for "Simplicity Prevails: Rethinking Negative... |
|
Emerging |
| 831 |
anseryuer/Local_LLM_Deployment_Guide_Chinese
本地部署大语言模型的中文教学 |
|
Emerging |
| 832 |
invictus717/MetaTransformer
Meta-Transformer for Unified Multimodal Learning |
|
Emerging |
| 833 |
MIC-DKFZ/MedNeXt
[MICCAI 2023] MedNeXt is a fully ConvNeXt architecture for 3D medical image... |
|
Emerging |
| 834 |
RManLuo/reasoning-on-graphs
Official Implementation of ICLR 2024 paper: "Reasoning on Graphs: Faithful... |
|
Emerging |
| 835 |
dccuchile/beto
BETO - Spanish version of the BERT model |
|
Emerging |
| 836 |
JoaoLages/RATransformers
RATransformers 🐭- Make your transformer (like BERT, RoBERTa, GPT-2 and T5)... |
|
Emerging |
| 837 |
poloclub/llm-landscape
NeurIPS'24 - LLM Safety Landscape |
|
Emerging |
| 838 |
gaussalgo/adaptor
ACL 2022: Adaptor: a library to easily adapt a language model to your own... |
|
Emerging |
| 839 |
yesbhautik/Talk-with-PDF
An interactive AI chatbot for querying and discussing the contents of PDF... |
|
Emerging |
| 840 |
jerry1993-tech/Cornucopia-LLaMA-Fin-Chinese
聚宝盆(Cornucopia):... |
|
Emerging |
| 841 |
virtualramblas/Domain-Specific-Small-Language-Models
Repository for the companion Colab notebook of the Domain-Specific Small... |
|
Emerging |
| 842 |
ckiplab/ckip-transformers
CKIP Transformers |
|
Emerging |
| 843 |
HUST-NingKang-Lab/MGM
MGM (Microbial General Model) as a large-scaled pretrained language model... |
|
Emerging |
| 844 |
zhudotexe/fanoutqa
Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering... |
|
Emerging |
| 845 |
git-disl/Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware... |
|
Emerging |
| 846 |
SakanaAI/text-to-lora
Hypernetworks that adapt LLMs for specific benchmark tasks using only... |
|
Emerging |
| 847 |
iaalm/llama-api-server
A OpenAI API compatible REST server for llama. |
|
Emerging |
| 848 |
jianzhnie/LLamaTuner
Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen,... |
|
Emerging |
| 849 |
uclaml/SPIN
The official implementation of Self-Play Fine-Tuning (SPIN) |
|
Emerging |
| 850 |
rkansal47/MPGAN
The message passing GAN https://arxiv.org/abs/2106.11535 and generative... |
|
Emerging |
| 851 |
JAMESYJL/ShapeLLM-Omni
[NeurIPS 2025 Spotlight] A Native Multimodal LLM for 3D Generation and Understanding |
|
Emerging |
| 852 |
linjieli222/HERO
Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for... |
|
Emerging |
| 853 |
FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios. |
|
Emerging |
| 854 |
AI-Hypercomputer/jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream)... |
|
Emerging |
| 855 |
sagorbrur/bntransformer
Bengali transformer using transformers |
|
Emerging |
| 856 |
bytedance/effective_transformer
Running BERT without Padding |
|
Emerging |
| 857 |
google-research/long-range-arena
Long Range Arena for Benchmarking Efficient Transformers |
|
Emerging |
| 858 |
xianglin226/Benchmarking-Single-Cell-Perturbation
Single-Cell (Perturbation) Model Library |
|
Emerging |
| 859 |
0hq/WebGPT
Run GPT model on the browser with WebGPU. An implementation of GPT inference... |
|
Emerging |
| 860 |
kamalkraj/e5-mistral-7b-instruct
Finetune mistral-7b-instruct for sentence embeddings |
|
Emerging |
| 861 |
IntelLabs/causality-lab
Causal discovery algorithms and tools for implementing new ones |
|
Emerging |
| 862 |
backprop-ai/backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models. |
|
Emerging |
| 863 |
pytorch/torchchat
Run PyTorch LLMs locally on servers, desktop and mobile |
|
Emerging |
| 864 |
salesforce/ETSformer
PyTorch code for ETSformer: Exponential Smoothing Transformers for... |
|
Emerging |
| 865 |
LibreTranslate/Locomotive
Toolkit for training/converting LibreTranslate compatible language models 🚂 |
|
Emerging |
| 866 |
spcl/x1
Official Implementation of "Reasoning Language Models: A Blueprint" |
|
Emerging |
| 867 |
hao-ai-lab/Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model... |
|
Emerging |
| 868 |
bhavsarpratik/easy-transformers
Utility functions to work with transformers |
|
Emerging |
| 869 |
gluonfield/enchanted
Enchanted is iOS and macOS app for chatting with private self hosted... |
|
Emerging |
| 870 |
thuml/AutoTimes
Official implementation for "AutoTimes: Autoregressive Time Series... |
|
Emerging |
| 871 |
rohan-paul/LLM-FineTuning-Large-Language-Models
LLM (Large Language Model) FineTuning |
|
Emerging |
| 872 |
salesforce/CodeTF
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM |
|
Emerging |
| 873 |
kyegomez/SparseAttention
Pytorch Implementation of the sparse attention from the paper: "Generating... |
|
Emerging |
| 874 |
Emmi-AI/noether
Deep-learning framework for Engineering AI. Built on transformer building... |
|
Emerging |
| 875 |
InternLM/CapRL
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image... |
|
Emerging |
| 876 |
Atome-FE/llama-node
Believe in AI democratization. llama for nodejs backed by llama-rs,... |
|
Emerging |
| 877 |
snap-stanford/relgt
Relational Graph Transformer |
|
Emerging |
| 878 |
sinanuozdemir/oreilly-pytorch-dl
Code for Deep Learning for Modern AI |
|
Emerging |
| 879 |
tlkh/t2t-tuner
Convenient Text-to-Text Training for Transformers |
|
Emerging |
| 880 |
oValach/RailSafeNet
Repository of the paper: RailSafeNet: Visual Scene Understanding for Tram Safety |
|
Emerging |
| 881 |
iPieter/RobBERT
A Dutch RoBERTa-based language model |
|
Emerging |
| 882 |
ddzipp/AutoAudit
AutoAudit—— the LLM for Cyber Security 网络安全大语言模型 |
|
Emerging |
| 883 |
ContextLab/llm-stylometry
LLM-based approach for distinguishing the writings of different authors. |
|
Emerging |
| 884 |
elicit/machine-learning-list
A curriculum for learning about foundation models, from scratch to the frontier |
|
Emerging |
| 885 |
JetRunner/BERT-of-Theseus
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT... |
|
Emerging |
| 886 |
gitabtion/SoftMaskedBert-PyTorch
🙈 An unofficial implementation of SoftMaskedBert based on huggingface/transformers. |
|
Emerging |
| 887 |
julienkay/com.doji.transformers
A Unity package to run pretrained transformer models with Unity Sentis |
|
Emerging |
| 888 |
ucbrise/graphtrans
Representing Long-Range Context for Graph Neural Networks with Global Attention |
|
Emerging |
| 889 |
bayesgroup/code_transformers
Empirical Study of Transformers for Source Code & A Simple Approach for... |
|
Emerging |
| 890 |
deep-diver/llamaduo
[ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration... |
|
Emerging |
| 891 |
IST-DASLab/marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up... |
|
Emerging |
| 892 |
salcc/QuantumTransformers
Quantum Transformers for High Energy Physics Analysis at the Large Hadron Collider |
|
Emerging |
| 893 |
MozerWang/AMPO
[ICLR 2026] Adaptive Social Learning via Mode Policy Optimization for Language Agents |
|
Emerging |
| 894 |
gupta-abhay/pytorch-vit
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale |
|
Emerging |
| 895 |
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward |
|
Emerging |
| 896 |
turtlesoupy/this-word-does-not-exist
This Word Does Not Exist |
|
Emerging |
| 897 |
sinanuozdemir/oreilly-huggingface-tour
A Crash Course in Hugging Face |
|
Emerging |
| 898 |
PureBee/purebee
A GPU defined in software. Runs Llama 3.2 1B at 3.6 tok/sec. Zero dependencies. |
|
Emerging |
| 899 |
Kartik-3004/SegFace
[AAAI 25] SegFace: Face Segmentation of Long-tail classes |
|
Emerging |
| 900 |
kevinMEH/keyscan
Keyscan: AI-powered API key scanner for GitHub Gists. |
|
Emerging |