ahans30/goldfish-loss

[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs

36
/ 100
Emerging

This project offers a simple technique to train large language models (LLMs) in a way that significantly reduces their tendency to 'memorize' and inadvertently reproduce specific parts of their training data. It takes an existing LLM training setup as input and outputs a fine-tuned or pre-trained LLM that is less prone to revealing sensitive or private information from its training set. This tool is for AI researchers and engineers who are pre-training or fine-tuning generative LLMs, especially those concerned with data privacy and model safety.

No commits in the last 6 months.

Use this if you are a machine learning engineer or researcher involved in developing and deploying large language models and need to reduce the risk of your models memorizing and leaking training data.

Not ideal if you are not directly involved in the low-level training or fine-tuning of large language models or if you are not using AMD compute nodes in a SLURM-managed distributed environment.

LLM training data privacy generative AI model safety AI research
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 9 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

97

Forks

8

Language

Python

License

Apache-2.0

Last pushed

Nov 17, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/ahans30/goldfish-loss"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.