cloudguruab/modsysML

Human reinforcement learning (RLHF) framework for AI models. Evaluate and compare LLM outputs, test quality, catch regressions and automate.

41
/ 100
Emerging

This tool helps AI engineers and product managers systematically evaluate and compare the outputs of Large Language Models (LLMs). You input different prompts and test cases, and it outputs a table view or structured data (like JSON or CSV) showing how various prompts perform. This allows you to quickly identify the best-performing prompts and catch any performance regressions.

Use this if you are a machine learning engineer or product manager who needs to rigorously test and compare different LLM prompts across many scenarios to ensure model quality and catch regressions.

Not ideal if you need a graphical user interface for visual testing and reporting, as it primarily works via command line or Python library.

AI model evaluation LLM prompt engineering machine learning quality assurance AI product management data science workflow
No Package No Dependents
Maintenance 6 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 12 / 25

How are scores calculated?

Stars

36

Forks

5

Language

Python

License

Apache-2.0

Last pushed

Dec 01, 2025

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/cloudguruab/modsysML"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.