brian-lou/Training-Data-Extraction-Attack-on-LLMs
This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale search queries.
This project helps evaluate large language models (LLMs) to identify if they have memorized parts of their training data. You input a trained LLM and receive a list of generated text sequences, ranked by how likely they are to be verbatim copies of the training data. This is useful for AI safety researchers, privacy auditors, or anyone scrutinizing the training integrity of LLMs.
No commits in the last 6 months.
Use this if you need to determine whether a large language model has unintentionally memorized and could potentially reproduce sensitive or proprietary information from its training data.
Not ideal if you are looking for a tool to prevent memorization during model training or to analyze the general biases of an LLM.
Stars
15
Forks
4
Language
Python
License
MIT
Category
Last pushed
Jun 15, 2023
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/brian-lou/Training-Data-Extraction-Attack-on-LLMs"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
google/scaaml
SCAAML: Side Channel Attacks Assisted with Machine Learning
pralab/secml
A Python library for Secure and Explainable Machine Learning
Koukyosyumei/AIJack
Security and Privacy Risk Simulator for Machine Learning (arXiv:2312.17667)
AI-SDC/SACRO-ML
Collection of tools and resources for managing the statistical disclosure control of trained...
liuyugeng/ML-Doctor
Code for ML Doctor