molereddy/Alternate-Preference-Optimization
[COLING 2025] code for "Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models".
This project helps AI developers and researchers refine large language models (LLMs) by selectively removing specific factual knowledge without damaging other capabilities. You provide a trained LLM and define the information you want to unlearn. The output is a modified LLM that no longer contains the specified facts, ready for deployment or further evaluation. This is ideal for those managing responsible AI development or fine-tuning models.
No commits in the last 6 months.
Use this if you need to erase particular factual information from a large language model to enhance privacy, reduce bias, or correct outdated data.
Not ideal if you're looking for a simple, no-code solution to filter LLM outputs or prevent it from generating certain content without altering the model's fundamental knowledge.
Stars
10
Forks
—
Language
Python
License
—
Category
Last pushed
Jan 14, 2025
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/transformers/molereddy/Alternate-Preference-Optimization"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
stair-lab/mlhp
Machine Learning from Human Preferences
princeton-nlp/SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
uclaml/SPPO
The official implementation of Self-Play Preference Optimization (SPPO)
general-preference/general-preference-model
[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...
sail-sg/dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards