yyy01/PAC

The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

27
/ 100
Experimental

This project helps AI researchers and practitioners identify if specific data was included in a Large Language Model's (LLM) training set. You provide a dataset of text snippets, and it tells you which ones might have contaminated the LLM. This is for anyone working to ensure the integrity and privacy of LLMs.

No commits in the last 6 months.

Use this if you need to detect 'data contamination' in black-box or white-box Large Language Models, verifying if specific text data was used in their training.

Not ideal if you are looking for a general-purpose data cleaning tool unrelated to LLM training data integrity.

LLM integrity AI data privacy model auditing language model research AI ethics
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 6 / 25
Maturity 16 / 25
Community 5 / 25

How are scores calculated?

Stars

16

Forks

1

Language

Python

License

MIT

Last pushed

May 21, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/yyy01/PAC"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.