molereddy/Alternate-Preference-Optimization

[COLING 2025] code for "Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models".

/ 100

Experimental

This project helps AI developers and researchers refine large language models (LLMs) by selectively removing specific factual knowledge without damaging other capabilities. You provide a trained LLM and define the information you want to unlearn. The output is a modified LLM that no longer contains the specified facts, ready for deployment or further evaluation. This is ideal for those managing responsible AI development or fine-tuning models.

No commits in the last 6 months.

Use this if you need to erase particular factual information from a large language model to enhance privacy, reduce bias, or correct outdated data.

Not ideal if you're looking for a simple, no-code solution to filter LLM outputs or prevent it from generating certain content without altering the model's fundamental knowledge.

large-language-models model-fine-tuning responsible-AI AI-ethics knowledge-unlearning

No License Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 5 / 25

Maturity 8 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

—

Higher-rated alternatives

stair-lab/mlhp

Machine Learning from Human Preferences

princeton-nlp/SimPO

[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward

uclaml/SPPO

The official implementation of Self-Play Preference Optimization (SPPO)

general-preference/general-preference-model

[ICML 2025] Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment...

sail-sg/dice

Official implementation of Bootstrapping Language Models via DPO Implicit Rewards

Explore Transformer Models

All categories Trending Transformer directory Insights