seal-research/OmniCode

OmniCode: A Diverse Software Engineering Benchmark for Evaluating Large Language Models

26
/ 100
Experimental

This project provides a standardized way to measure how well AI models, specifically Large Language Models (LLMs), perform at various software development tasks. It takes in a codebase and a problem description, then evaluates the LLM's ability to fix bugs, generate tests, apply style guidelines, or respond to code review feedback. Software engineering researchers and developers building AI-powered coding assistants would use this to benchmark and compare their models.

Use this if you are a researcher or developer who needs to rigorously evaluate the software engineering capabilities of a Large Language Model across different coding challenges.

Not ideal if you are looking for an AI assistant to help you write code or automate development tasks directly; this is purely a benchmarking tool.

LLM evaluation software engineering research code quality test automation AI development
No License No Package No Dependents
Maintenance 13 / 25
Adoption 5 / 25
Maturity 8 / 25
Community 0 / 25

How are scores calculated?

Stars

13

Forks

Language

Python

License

Last pushed

Mar 16, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/seal-research/OmniCode"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.