AmirhosseinHonardoust/Detector-Reliability-Report-Card

Decision-safe evaluation + Streamlit dashboard for AI vs Human vs Post-Edited AI text detection. Generates a reliability report card (Accuracy, Macro F1, ECE, Brier), calibration plots, confidence histograms, and a coverage-vs-performance abstention curve. Recommends an operating threshold for human-review routing.

26
/ 100
Experimental

This project helps content moderators, integrity analysts, and policy enforcement teams evaluate the trustworthiness of AI-powered text detection systems. It takes predictions from your AI text detector and ground truth labels, then produces a comprehensive "Reliability Report Card." This report includes performance metrics, calibration plots, and a recommended confidence threshold for routing uncertain cases to human review.

Use this if you need to understand not just if your AI detector is accurate, but also when to trust its confidence scores and when to involve human reviewers to minimize costly errors.

Not ideal if you only need a basic accuracy score for your text detector and are not concerned with calibration, abstention policies, or human-in-the-loop workflows.

content-moderation trust-and-safety AI-detection human-in-the-loop AI-ethics
No Package No Dependents
Maintenance 10 / 25
Adoption 5 / 25
Maturity 11 / 25
Community 0 / 25

How are scores calculated?

Stars

10

Forks

Language

Python

License

MIT

Last pushed

Feb 14, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ml-frameworks/AmirhosseinHonardoust/Detector-Reliability-Report-Card"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.