Uncategorized AI Safety Tools

There are 102 uncategorized tools tracked. 5 score above 70 (verified tier). The highest-rated is BlackArch/blackarch at 76/100 with 3,312 stars. 5 of the top 10 are actively maintained.

Get all 102 projects as JSON

curl "https://pt-edge.onrender.com/api/v1/datasets/quality?domain=ai-safety&subcategory=uncategorized&limit=20"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.

# Tool Score Tier
1 BlackArch/blackarch

An ArchLinux based distribution for penetration testers and security researchers.

76
Verified
2 BishopFox/sliver

Adversary Emulation Framework

73
Verified
3 casdoor/casdoor

An open-source AI-first Identity and Access Management (IAM) /AI MCP & agent...

72
Verified
4 0xsyr0/Awesome-Cybersecurity-Handbooks

A huge chunk of my personal notes since I started playing CTFs and working...

71
Verified
5 openguardrails/openguardrails

Protect every action your agent takes.

70
Verified
6 Agent-Threat-Rule/agent-threat-rules

Open detection standard for AI agent threats. Like Sigma, but for prompt...

68
Established
7 zan8in/afrog

A Security Tool for Bug Bounty, Pentest and Red Teaming.

67
Established
8 globalbao/awesome-azure-policy

A curated list of blogs, videos, tutorials, code, tools, scripts, and...

63
Established
9 Pantheon-Security/medusa

AI-first security scanner with 76 analyzers, 9,600+ detection rules, and...

55
Established
10 dapurv5/awesome-red-teaming-llms

Papers from our SoK on Red-Teaming (Accepted at TMLR)

54
Established
11 IBM/ares

AI Robustness Evaluation System

54
Established
12 The-Z-Labs/bof-launcher

bof-launcher - a library for loading, executing and in-memory masking BOFs...

54
Established
13 H3llKa1ser/B00t2R00t

A penetration testing Swiss Army Knife that's suitable for CTF challenges,...

53
Established
14 aisa-group/PostTrainBench

Measuring how well CLI agents like Claude Code or Codex CLI can post-train...

51
Established
15 emmanuelgjr/GenAI-Security-Crosswalk

The most comprehensive open-source mapping of OWASP GenAI risks to industry...

50
Established
16 microsoft/llmail-inject-challenge

Code for the API, workload execution, and agents underlying the...

49
Emerging
17 microsoft/Test_Awareness_Steering

Code for the paper: Linear Control of Test Awareness Reveals Differential...

49
Emerging
18 mensfeld/code-on-incus

Run coding agents in hardened Incus containers with real-time network threat...

49
Emerging
19 skoveit/skovenet

Decentralized Adversary Emulation Framework

49
Emerging
20 Shiritai/sanity-gravity

Providing a strong Gravity in the wild world of Antigravity (AI Agents), to...

48
Emerging
21 bm-github/owasp-social-osint-agent

AI-powered OSINT framework for multi-platform social media intelligence...

48
Emerging
22 knostic/openclaw-shield

Security plugin for OpenClaw agents - prevents secret leaks, PII exposure,...

47
Emerging
23 davidmatousek/tachi

Automated threat modeling toolkit — STRIDE + AI-specific threats in one command

45
Emerging
24 ProfessionalWiki/PageApprovals

Increase trust in your wiki knowledge base via approval workflows

45
Emerging
25 s-r-e-e-r-a-j/GhostBuilder

GhostBuilder is a powerful payload generator tool designed for ethical...

44
Emerging
26 romovpa/claudini

Autoresearch for LLM adversarial attacks

44
Emerging
27 GnomeMan4201/zer0DAYSlater

Instrumented adversarial simulation framework for studying detection,...

44
Emerging
28 nikthink/pauseai.ca

https://pauseai.ca/

43
Emerging
29 Ruso-0/Nreki

MCP plugin that intercepts AI agent edits in RAM, validates them ...

43
Emerging
30 HASHIRU-AI/NAAMSE

Neural Adversarial Agent Mutation-based Security Evaluator

42
Emerging
31 Fan1234-1/tonesoul52

AI governance framework — semantic responsibility, self-auditing memory,...

42
Emerging
32 rehydra-ai/rehydra-sdk

Prevent accidental PII leakage in LLM prompts before they hit the model.

42
Emerging
33 StackOneHQ/defender

Open source prompt injection protection for Agents calling tools (via MCP,...

41
Emerging
34 t81dev/t81-foundation

T81 is the first operating system built for governed, deterministic AI...

41
Emerging
35 aegis-initiative/aegis-governance

[.com] Architectural governance framework for AI systems enabling...

41
Emerging
36 FIND-Lab/AgentWard

AgentWard – Built for all, hardened for OpenClaw.

41
Emerging
37 Tooooa/AgentMark

【ACL 2026 Main】AgentMark: Utility-Preserving Behavioral Watermarking for Agents

41
Emerging
38 trnt-ai/trent-openclaw-security-assessment

Free security assessment for your OpenClaw 🦞 environment. Scans gateway...

40
Emerging
39 ppcvote/prompt-defense-audit

Deterministic LLM prompt defense scanner — 12 attack vectors, pure regex,...

40
Emerging
40 HOLYKEYZ/IntellectSafe

AI defense infrastructure against manipulation, misuse, hallucinations, and...

40
Emerging
41 gd2bk1ng/syntra_kernel

Syntra Kernel is a Cognitive Operating System - A modular,...

39
Emerging
42 readme-SVG/Issues-heroes-badge

Serverless Vercel API that renders an animated SVG badge from validated...

39
Emerging
43 AbdelStark/awesome-ai-safety

A curated list of AI safety resources: alignment, interpretability,...

39
Emerging
44 sane-apps/SaneProcess

Workflow enforcement for coding agents: Claude Code hooks, Codex...

38
Emerging
45 bitflight-devops/hallucination-detector

Zero-dependency Claude Code plugin that catches speculation, invented...

38
Emerging
46 Myth727/ARCHITECT-Universal-Coherence-Engine

Full-stack inference-time coherence engine for LLM conversations. Per-turn...

38
Emerging
47 bkr1297-RIO/rio-receipt-protocol

RIO Receipt Protocol — Cryptographic proof for AI actions. Open standard for...

38
Emerging
48 ansonsaju/project-claudia

Meet Claudia: An adversarial AI agent for Project Sentinel that relentlessly...

38
Emerging
49 NeuZhou/agentimmune

The first firewall for AI agents. Stops prompt injection, data leaks, and...

37
Emerging
50 Diplomat-ai/diplomat-agent

AI agent scanner to finds every tool call that can change the real world and...

37
Emerging
51 fairvisor/edge

Open-source edge engine to control API request budgets and enforce fair usage.

36
Emerging
52 renefichtmueller/ShieldX

Self-Evolving LLM Prompt Injection Defense — 547+ rules, 50+ languages,...

36
Emerging
53 RafaelParonis/jailbench

🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights...

36
Emerging
54 nhowardtli/virp

VIRP Protocol: Cryptographic Trust Primitives and Network Trust Anchor...

36
Emerging
55 efij/secure-claude-code

Security guardrails for Claude Code, MCP tools, and Claude cowork workflows....

34
Emerging
56 kureha-yamaguchi/reasoning-manipulation

Adversarial Manipulation of CoT

33
Emerging
57 Nicholas-Kloster/claude-4.6-jailbreak-vulnerability-disclosure-unredacted

Three Claude production tiers generated functional exploit code against live...

33
Emerging
58 minggo-commits/prompt-labs

The definitive open-source guide for mastering AI Prompt Engineering techniques.

33
Emerging
59 dapurv5/alignmark

Watermarking Degrades Alignment in Language Models: Analysis and Mitigation...

32
Emerging
60 alivsch-ai-code/flux-bot-azamat

Python Telegram Bot für AI-Bild- und Video-Generierung. Replicate API, Neon...

32
Emerging
61 ParraX123/meta-ai-bug-bounty

🛡️ Discover and analyze critical vulnerabilities in Meta AI's Instagram...

31
Emerging
62 Jonathanmutu/recursive-containment-framework

A recursive alignment framework for cognitive coherence, belief tracking,...

31
Emerging
63 teilomillet/enzu

Budgeted LLM runs with hard caps + typed outcomes + async job mode....

31
Emerging
64 AionSystem/AION-BRAIN

The left hemisphere. Frameworks, logic, and certainty architecture. Home of...

31
Emerging
65 Convergence-Human-And-Technology/sovereign-drive

R&D · Legally accountable AI systems for autonomous vehicle operation...

30
Emerging
66 alyssadata/Emergent-Agency

DEMO-READY Automated consistency verification for behavioral tracking....

29
Experimental
67 Dandona100/SafeEyes

│ Real-time NSFW & harmful content detection as a service

29
Experimental
68 Project-Navi/navi-creative-determinant

The Creative Determinant: autopoietic closure as a nonlinear elliptic BVP on...

29
Experimental
69 aisa-group/skill-inject

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

29
Experimental
70 Lukentony/AI-guardian-lab

Security middleware for AI Agents. Intercepts shell commands before...

28
Experimental
71 LogicPearlHQ/logicpearl

Your logic, distilled into pearls. ⚪

28
Experimental
72 thansz137/asiyah-protocol

A philosophical approach for synthetic minds. An open door, an extended...

28
Experimental
73 URDev4ever/LATT

LLM Attack Testing Toolkit is a structured methodology and mindset framework...

28
Experimental
74 Mindburn-Labs/helm-oss

HELM OSS — Open-source core of the HELM Autonomous OS. Policy enforcement,...

28
Experimental
75 yunbeizhang/MM-Plan

[ICLRW 2026 Oral] Visual Exclusivity Attacks: Automatic Multimodal Red...

27
Experimental
76 HUNT-001/ai-chip-design-platform

Multi-agent RISC-V verification and test-generation framework for...

27
Experimental
77 atani/mysh

MySQL connection manager with SSH tunnel support. Auto-masks sensitive data...

27
Experimental
78 souvikghosh957/secret-sanitizer-extension

Chrome extension that masks secrets & sensitive data before pasting into AI...

27
Experimental
79 Jang-woo-AnnaSoft/execution-boundaries

Design notes on execution boundaries and responsibility structures for AI...

27
Experimental
80 hiranp/hief

HIEF (Hybrid Intent-Evaluation Framework) Persistent Memory Layer for AI...

27
Experimental
81 nhomyk/agenticqa-scan-action

Map every integration point in your AI codebase — 13 CWE categories, attack...

26
Experimental
82 EntropyWizardchaos/developmental-ai-governance

Developmental lifecycle architecture for AI agents: childhood stages,...

26
Experimental
83 Kibertum/SENAR

SENAR — Supervised Engineering & Normative AI Regulation. Open methodology...

26
Experimental
84 Ratila1/JGuardrails

🛡️ Programmable Guardrails for LLM Applications in Java. A...

26
Experimental
85 yurukusa/cc-safe-setup

One command to make Claude Code safe for autonomous operation. 658 example...

26
Experimental
86 nevitonsantana/aletheia

Operating framework for AI-assisted work with decision, governance,...

25
Experimental
87 kiyoshisasano/agent-failure-debugger

A deterministic pipeline that diagnoses, explains, and safely auto-fixes...

25
Experimental
88 0xadvait/divergence-explorer

Autonomous AI researcher that probes where frontier models disagree — with...

25
Experimental
89 prompt-armor/prompt-armor

Open-source prompt injection detector — 5 layers, 91.7% F1, ~27ms, offline,...

25
Experimental
90 AKURHULA/LLMSecurityGuide

🛡️ Explore tools for securing Large Language Models, uncovering their...

24
Experimental
91 dstours/OctoC2

GitHub-native Command & Control Framework. 11 covert channels. Zero infrastructure.

24
Experimental
92 solanticai/vibe-guard

AI coding guardrails framework. Runtime-enforced quality controls for Claude...

24
Experimental
93 dislovelhl/acgs-lite

Constitutional governance infrastructure for AI agents — the missing safety...

24
Experimental
94 michaelgregoryibizugbe/PHANTOM-RECON

👻 PHANTOM-RECON Is An Advanced Network Reconnaissance Tool With...

24
Experimental
95 sushaan-k/infiltr

RL-based LLM red-team framework with MITRE ATLAS reporting and CI-ready outputs

24
Experimental
96 velzepooz/skill-detector

CLI to spot risky AI skill packages before you use them. Scans for...

24
Experimental
97 anthril/vibe-guard

AI coding guardrails framework. Runtime-enforced quality controls for Claude...

24
Experimental
98 gw0/docker-claude-code

Dockerized Claude Code Sandbox

21
Experimental
99 safal207/Living-Relational-Identity-LRI

Living Relational Identity (LRI) defines non-operational invariants that...

20
Experimental
100 zscaler/zguard-ai-integrations

Central documentation repository for all integrations with Zscaler AI Guard

18
Experimental
101 Orellius-Archive/orellius-cognitive

⚠️ UNMAINTAINED — AI personality, safety, red-teaming, and sandboxing in Rust SDK.

16
Experimental
102 imurtuja/InkVerse

Where Code Meets Poetry 🪶 A premium, production-grade social ecosystem for...

16
Experimental