microsoft/SWE-bench-Live

[NeurIPS 2025 D&B] 🚀 SWE-bench Goes Live!

53
/ 100
Established

This project helps AI researchers and developers evaluate how well their AI systems can resolve real-world software engineering issues. It takes an AI model's proposed code changes (patches) for identified software bugs or tasks across various languages and platforms, and then provides a benchmarked evaluation of its performance against a continuously updated dataset of real-world problems. This tool is for AI system developers and researchers focused on building and improving AI-powered software development tools and agents.

170 stars.

Use this if you are developing AI models designed to fix software bugs or implement new features and need a robust, current, and objective way to measure their performance.

Not ideal if you are a software developer looking for a tool to help you personally fix bugs or automate your daily coding tasks, as this is an evaluation framework for AI systems.

AI-evaluation software-engineering-AI LLM-benchmarking code-repair-AI AI-developer-tools
No Package No Dependents
Maintenance 10 / 25
Adoption 10 / 25
Maturity 16 / 25
Community 17 / 25

How are scores calculated?

Stars

170

Forks

23

Language

Python

License

MIT

Last pushed

Mar 09, 2026

Commits (30d)

0

Get this data via API

curl "https://pt-edge.onrender.com/api/v1/quality/llm-tools/microsoft/SWE-bench-Live"

Open to everyone — 100 requests/day, no key needed. Get a free key for 1,000/day.