IngestAI/deepmark

Deepmark AI enables a unique testing environment for language models (LLM) assessment on task-specific metrics and on your own data so your GenAI-powered solution has predictable and reliable performance.

/ 100

Experimental

This tool helps Generative AI application builders ensure their AI solutions perform reliably and predictably. You input your own data and specify a Large Language Model (LLM) along with task-specific criteria like question answering accuracy or cost. The tool then provides assessment results, showing which LLM best meets your application's needs. It's designed for developers building GenAI-powered applications.

104 stars. No commits in the last 6 months.

Use this if you are a developer building Generative AI applications and need to rigorously test and compare different Large Language Models on your own data to ensure predictable, reliable, and cost-effective performance.

Not ideal if you are an end-user of a GenAI application and not involved in the development or model selection process.

Generative AI development LLM evaluation AI application testing Model benchmarking AI performance assessment

Stale 6m No Package No Dependents

Maintenance 0 / 25

Adoption 9 / 25

Maturity 16 / 25

Community 4 / 25

How are scores calculated?

Stars

104

Forks

Language

PHP

License

AGPL-3.0

Featured in

You're Shipping AI You Can't Measure

Higher-rated alternatives

EvolvingLMMs-Lab/lmms-eval

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

vibrantlabsai/ragas

Supercharge Your LLM Application Evaluations 🚀

open-compass/VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

EuroEval/EuroEval

The robust European language model benchmark.

Giskard-AI/giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

Explore LLM Tools

All categories Trending LLM Tool directory Insights