chrisliu298/llm-unlearn-eco
[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts
This project helps large language model (LLM) operators control what their models should not know without retraining the entire model. You provide an existing LLM and specify certain information or entities it should forget. The system then processes incoming prompts, detecting if they relate to the 'forbidden' knowledge, and subtly alters the prompt's input to prevent the LLM from generating responses based on that knowledge. This tool is for anyone managing LLMs who needs to ensure their models avoid generating specific sensitive, copyrighted, or inappropriate content.
No commits in the last 6 months.
Use this if you need a lightweight and efficient way to make a powerful, pre-trained LLM 'forget' specific information or topics without costly and time-consuming retraining.
Not ideal if you need to completely erase knowledge from an LLM's core weights or if you are working with open-source models where full model fine-tuning is feasible and preferred.
Stars
38
Forks
4
Language
Python
License
—
Category
Last pushed
Sep 26, 2024
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/chrisliu298/llm-unlearn-eco"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards