chrisliu298/llm-unlearn-eco

[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts

/ 100

Experimental

This project helps large language model (LLM) operators control what their models should not know without retraining the entire model. You provide an existing LLM and specify certain information or entities it should forget. The system then processes incoming prompts, detecting if they relate to the 'forbidden' knowledge, and subtly alters the prompt's input to prevent the LLM from generating responses based on that knowledge. This tool is for anyone managing LLMs who needs to ensure their models avoid generating specific sensitive, copyrighted, or inappropriate content.

No commits in the last 6 months.

Use this if you need a lightweight and efficient way to make a powerful, pre-trained LLM 'forget' specific information or topics without costly and time-consuming retraining.

Not ideal if you need to completely erase knowledge from an LLM's core weights or if you are working with open-source models where full model fine-tuning is feasible and preferred.

LLM-governance content-moderation data-privacy intellectual-property-protection AI-safety

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 7 / 25

Maturity 8 / 25

Community 10 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

—

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights