rentruewang/bocoel

Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) that's 10 times faster with just a few lines of modular code.

47
/ 100
Emerging

This helps AI researchers and machine learning engineers quickly and accurately evaluate how well large language models (LLMs) perform on various tasks. You provide your large dataset, and it intelligently selects a small, representative subset to test the LLM on, giving you fast and reliable performance metrics. This is ideal for anyone working with LLMs who needs to benchmark their models efficiently without spending excessive time or computational resources.

289 stars.

Use this if you need to benchmark the accuracy of large language models on extensive datasets but want to drastically reduce the time and cost involved in the evaluation process.

Not ideal if your evaluation needs are for small datasets or if you are not working with large language models.

LLM evaluation AI benchmarking machine learning research model performance optimization data sampling
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 11 / 25

How are scores calculated?

Stars

289

Forks

16

Language

Python

License

BSD-3-Clause

Last pushed

Jan 18, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/rentruewang/bocoel"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.