onerun-ai/onerun

Open-source framework for stress-testing LLMs and conversational AI. Identify hallucinations, policy violations, and edge cases with scalable, realistic simulations. Join the discord: https://discord.gg/ssd4S37WNW

/ 100

Experimental

This project helps AI product managers, QA engineers, and conversational AI designers rigorously test their large language models (LLMs) and AI agents. It takes your AI agent and simulates diverse, realistic user conversations at scale to identify issues. The output is evaluation datasets with judge-labeled conversations and training data to improve your AI.

No commits in the last 6 months.

Use this if you need to thoroughly stress-test your AI agents and LLMs for hallucinations, policy violations, and unexpected edge cases before they reach your users.

Not ideal if you are looking for a simple, no-code solution, as this requires a basic understanding of Docker and local environment setup.

AI-testing conversational-AI LLM-evaluation AI-QA prompt-engineering

Stale 6m No Package No Dependents

Maintenance 2 / 25

Adoption 6 / 25

Maturity 15 / 25

Community 0 / 25

How are scores calculated?

Stars

Forks

—

Language

Python

License

Apache-2.0

Higher-rated alternatives

4thfever/cultivation-world-simulator

基于 AI Agent 工作流的修仙世界模拟器，旨在还原智能、开放的仙侠世界。| An open-source Cultivation World Simulator using...

nikmcfly/MiroFish-Offline

Offline multi-agent simulation & prediction engine. English fork of MiroFish with Neo4j + Ollama...

oil-oil/wolfcha

AI-powered Werewolf (Mafia) social deduction game where every player is controlled by top LLMs...

KsanaDock/Microverse

A god-simulation sandbox game built on Godot 4 as a multi-agent AI social simulation system. In...

yasserfarouk/negmas

Negotiation Multi-Agent System (A negotiation library designed for situated negotiations within...

Explore AI Agents

All categories Trending AI Agent directory Insights