IBM/DEFT

Official pytorch code for "From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers" (AAAI 2025)

/ 100

Experimental

This project offers a way for machine learning engineers to fine-tune large language models more efficiently. It takes a pre-trained language model and applies a novel density loss during parameter-efficient fine-tuning methods like LoRA or Adapter. The output is a fine-tuned model with significantly reduced activation density, which can lead to faster inference on specialized hardware while maintaining performance on tasks like text classification or question answering. This is for machine learning engineers who work with large transformer models.

No commits in the last 6 months.

Use this if you are a machine learning engineer looking to reduce the computational cost and improve the inference speed of your fine-tuned transformer models without sacrificing performance.

Not ideal if you are a non-developer or if your primary goal is to simply use an off-the-shelf fine-tuned model without optimizing its underlying efficiency for deployment.

large-language-models model-optimization deep-learning-deployment natural-language-processing transformer-models

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 4 / 25

Maturity 16 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

philipperemy/keras-attention

Keras Attention Layer (Luong and Bahdanau scores).

tatp22/linformer-pytorch

My take on a practical implementation of Linformer for Pytorch.

ematvey/hierarchical-attention-networks

Document classification with Hierarchical Attention Networks in TensorFlow. WARNING: project is...

datalogue/keras-attention

Visualizing RNNs using the attention mechanism

thushv89/attention_keras

Keras Layer implementation of Attention for Sequential models

Explore ML Frameworks

All categories Trending ML Framework directory Insights