haesleinhuepf/human-eval-bia

Benchmarking Large Language Models for Bio-Image Analysis Code Generation

41
/ 100
Emerging

This tool helps bio-image analysis researchers, lab managers, and scientists evaluate how well different AI models (Large Language Models) can write Python code for their specific image analysis tasks. You provide a set of bio-image analysis problems and their human-written solutions, and it automatically tests various AI models, showing you which ones generate correct code. The output is a clear report on the accuracy of each AI model's generated code.

No commits in the last 6 months.

Use this if you need to compare different AI code-generation models to find the most reliable one for automating bio-image analysis scripting, or if you're developing new test cases for bio-image analysis programming challenges.

Not ideal if you are looking for a tool to generate bio-image analysis code directly, or if you only need to run existing analysis scripts.

bio-image analysis microscopy scientific computing research automation code evaluation
Stale 6m No Package No Dependents
Maintenance 0 / 25
Adoption 7 / 25
Maturity 16 / 25
Community 18 / 25

How are scores calculated?

Stars

25

Forks

14

Language

Jupyter Notebook

License

MIT

Last pushed

Nov 21, 2024

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/transformers/haesleinhuepf/human-eval-bia"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.