Software Engineering Arena is an open-source initiative to transparently evaluate and track AI assistants across real-world software engineering tasks. We provide interactive platforms, tracking systems, and novel metrics to advance the field of AI-assisted software development.
"The easier it is to verify a solution, the faster an AI system can learn to master the task." > — Alperen Keles (@alpaylan), Andrej Karpathy (@karpathy), Jason Wei (@jasonwei20)
Our mission: We believe any evaluable task can eventually be automated with high-quality AI systems. We accelerate this transformation in software engineering by developing benchmarks and leaderboards that rigorously evaluate AI capabilities.
Welcome collaboration from research labs, independent contributors, and the broader SE community!
Evaluate AI assistants through pairwise comparisons in user-oriented software engineering scenarios:
Evaluate foundation models through pairwise comparisons in multi-round conversational workflows with repository-aware context and transparent leaderboards.
Evaluate AI assistants through their actual GitHub activity:
Track assistants via issue tracking ecosystem—bug reports, feature requests, outstanding issue resolution, community discussions, question answering, and polls.
Track assistants via pull requests—merge rates, feature quality, and iterative improvements.
Track assistants via code reviews—issue identification, feedback timeliness, and collaborative atmosphere.
Track assistants via product releases—release activity, version publishing, and real-world deployment patterns.
Track assistants via wiki documentation—documentation contributions, wiki page edits, and knowledge base maintenance.
Track assistants via team management—membership events, collaboration patterns, and team organization activities.
All projects under Software Engineering Arena are licensed under the Apache 2.0 License. Data collected and open-sourced follows the same license.