v7labs/benchllm

Continuous Integration for LLM powered applications

46
/ 100
Emerging

This tool helps AI engineers and developers ensure their Large Language Models (LLMs) and AI applications are working correctly. You input your LLM's code and a set of expected responses for various prompts, and it automatically tests your application. The output is a detailed report highlighting any inaccurate or 'hallucinated' responses, so you can fix them before deployment.

254 stars. No commits in the last 6 months. Available on PyPI.

Use this if you are building applications powered by LLMs, agents, or chains (like Langchain) and need to consistently verify their accuracy and prevent incorrect outputs across different versions.

Not ideal if you are not developing with Large Language Models or if you need a solution that is already fully stable and mature, as this project is in active, rapid development.

LLM development AI application testing model validation AI quality assurance machine learning engineering
Stale 6m
Maintenance 0 / 25
Adoption 10 / 25
Maturity 25 / 25
Community 11 / 25

How are scores calculated?

Stars

254

Forks

13

Language

Python

License

MIT

Last pushed

Aug 11, 2023

Commits (30d)

0

Dependencies

5

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/v7labs/benchllm"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.