brian-lou/Training-Data-Extraction-Attack-on-LLMs

This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale search queries.

37
/ 100
Emerging

This project helps evaluate large language models (LLMs) to identify if they have memorized parts of their training data. You input a trained LLM and receive a list of generated text sequences, ranked by how likely they are to be verbatim copies of the training data. This is useful for AI safety researchers, privacy auditors, or anyone scrutinizing the training integrity of LLMs.

No commits in the last 6 months.

Use this if you need to determine whether a large language model has unintentionally memorized and could potentially reproduce sensitive or proprietary information from its training data.

Not ideal if you are looking for a tool to prevent memorization during model training or to analyze the general biases of an LLM.

AI Safety LLM Auditing Data Privacy Model Evaluation Machine Learning Security
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

15

Forks

4

Language

Python

License

MIT

Last pushed

Jun 15, 2023

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/brian-lou/Training-Data-Extraction-Attack-on-LLMs"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.