seal-research/OmniCode
OmniCode: A Diverse Software Engineering Benchmark for Evaluating Large Language Models
This project provides a standardized way to measure how well AI models, specifically Large Language Models (LLMs), perform at various software development tasks. It takes in a codebase and a problem description, then evaluates the LLM's ability to fix bugs, generate tests, apply style guidelines, or respond to code review feedback. Software engineering researchers and developers building AI-powered coding assistants would use this to benchmark and compare their models.
Use this if you are a researcher or developer who needs to rigorously evaluate the software engineering capabilities of a Large Language Model across different coding challenges.
Not ideal if you are looking for an AI assistant to help you write code or automate development tasks directly; this is purely a benchmarking tool.
Stars
13
Forks
—
Language
Python
License
—
Category
Last pushed
Mar 16, 2026
Commits (30d)
0
Get this data via API
curl "https://pt-edge.onrender.com/api/v1/quality/ai-coding/seal-research/OmniCode"
Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.
Higher-rated alternatives
k4black/codebleu
Pip compatible CodeBLEU metric implementation available for linux/macos/win
LiveCodeBench/LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of...
EdinburghNLP/code-docstring-corpus
Preprocessed Python functions and docstrings for automated code documentation (code2doc) and...
hendrycks/apps
APPS: Automated Programming Progress Standard (NeurIPS 2021)
solis-team/Hydra
[FSE 2026] Do Not Treat Code as Natural Language: Implications for Repository-Level Code...