poloclub/llm-landscape

NeurIPS'24 - LLM Safety Landscape

/ 100

Emerging

This tool helps AI safety researchers and ML engineers understand how robust their large language models (LLMs) are to fine-tuning and other modifications. It takes your fine-tuned LLM and visualizes its 'safety landscape,' showing how much you can tweak the model's weights before its safety suddenly degrades. The output includes plots of this safety basin and a 'VISAGE score' indicating the model's safety robustness.

Use this if you are developing or deploying LLMs and need to rigorously assess the safety risks associated with finetuning, weight adjustments, or potential harmful attacks.

Not ideal if you are looking for a simple, out-of-the-box solution for general LLM safety scanning without deep technical analysis of model weights.

AI Safety Research LLM Evaluation Model Risk Assessment Machine Learning Engineering Responsible AI

No Package No Dependents

Maintenance 6 / 25

Adoption 7 / 25

Maturity 16 / 25

Community 15 / 25

How are scores calculated?

Stars

Forks

Language

Python

License

MIT

Higher-rated alternatives

sintel-dev/sigllm

Using Large Language Models for Time Series Anomaly Detection

guanwei49/LogLLM

LogLLM: Log-based Anomaly Detection Using Large Language Models (system log anomaly detection)

yangzhch6/AlignedCoT

Implementation of our paper "Speak Like a Native: Prompting Large Language Models in a Native Style"

CloudnetUCSC/VMFT-LAD

The source repository of "Virtual Machine Proactive Fault Tolerance using Log-based Anomaly...

Explore Transformer Models

All categories Trending Transformer directory Insights