BigCodeArena: Judging code generations end to end with code executions
•
21
None defined yet.
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Compare two AI models by sending them code and seeing their responses
Explore code generation model benchmarks and solve rates
Compare two AI models by sending them code and seeing their responses
Explore code generation model benchmarks and solve rates