brian-lou/Training-Data-Extraction-Attack-on-LLMs

This project explores training data extraction attacks on the LLaMa 7B, GPT-2XL, and GPT-2-IMDB models to discover memorized content using perplexity, perturbation scoring metrics, and large scale search queries.

/ 100

Emerging

This project helps evaluate large language models (LLMs) to identify if they have memorized parts of their training data. You input a trained LLM and receive a list of generated text sequences, ranked by how likely they are to be verbatim copies of the training data. This is useful for AI safety researchers, privacy auditors, or anyone scrutinizing the training integrity of LLMs.

No commits in the last 6 months.

Use this if you need to determine whether a large language model has unintentionally memorized and could potentially reproduce sensitive or proprietary information from its training data.

Not ideal if you are looking for a tool to prevent memorization during model training or to analyze the general biases of an LLM.

AI Safety LLM Auditing Data Privacy Model Evaluation Machine Learning Security

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 6 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

google/scaaml

SCAAML: Side Channel Attacks Assisted with Machine Learning

pralab/secml

A Python library for Secure and Explainable Machine Learning

Koukyosyumei/AIJack

Security and Privacy Risk Simulator for Machine Learning (arXiv:2312.17667)

AI-SDC/SACRO-ML

Collection of tools and resources for managing the statistical disclosure control of trained...

liuyugeng/ML-Doctor

Code for ML Doctor

Explore ML Frameworks

All categories Trending ML Framework directory Insights