poloclub/llm-landscape

NeurIPS'24 - LLM Safety Landscape

44
/ 100
Emerging

This tool helps AI safety researchers and ML engineers understand how robust their large language models (LLMs) are to fine-tuning and other modifications. It takes your fine-tuned LLM and visualizes its 'safety landscape,' showing how much you can tweak the model's weights before its safety suddenly degrades. The output includes plots of this safety basin and a 'VISAGE score' indicating the model's safety robustness.

Use this if you are developing or deploying LLMs and need to rigorously assess the safety risks associated with finetuning, weight adjustments, or potential harmful attacks.

Not ideal if you are looking for a simple, out-of-the-box solution for general LLM safety scanning without deep technical analysis of model weights.

AI Safety Research LLM Evaluation Model Risk Assessment Machine Learning Engineering Responsible AI
No Package No Dependents
Maintenance 6 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 15 / 25

How are scores calculated?

Stars

39

Forks

7

Language

Python

License

MIT

Last pushed

Oct 21, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/poloclub/llm-landscape"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.