csm9493/efficient-llm-unlearning

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs (ICLR 2025)

/ 100

Experimental

This project helps machine learning engineers and researchers effectively remove specific, undesirable information from large language models (LLMs) without completely retraining them. You input a pre-trained LLM and the data you want it to "forget," and it outputs a modified LLM that no longer retains that specific knowledge, along with metrics to evaluate the unlearning process. This is ideal for those responsible for maintaining model compliance or updating knowledge in deployed LLMs.

No commits in the last 6 months.

Use this if you need to efficiently remove specific training data or facts from a large language model to address privacy concerns, correct misinformation, or update outdated information.

Not ideal if you're not working directly with training and fine-tuning large language models or if you need to completely erase a broad category of information that would require extensive retraining.

large-language-models model-governance ai-ethics data-privacy machine-learning-operations

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 16 / 25

Community 6 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights