Questions regarding Gemini 2.5 pro results on leaderboard

Hi folks,

I'm trying to reproduce the Gemini 2.5 pro's results locally, and want to clarify on a few things
 
1. What's the difference between `Gemini-2.5-Pro-06-05` and `Gemini-2.5-Pro-05-06`? The pass@1 differed a lot (74.2 vs 68.1 on 01/01/2025-05/01/2025).
2. Is there an explicit list of model hyperparams that are used per submission (top_p, temperature, max_tokens, etc.)? I could only find top_p=0.95 and temperature=0.2 from the paper but not sure about the submission's setup.
3. Could model evolvement during the pass 5 months (from May to now) impact the benchmark results (edited, noticed a bug for previous statement)
4. In general, what's the best strategy to reproduce the leaderboard results on Gemini and other models?

Thanks in advance for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions regarding Gemini 2.5 pro results on leaderboard #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions regarding Gemini 2.5 pro results on leaderboard #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions