Skip to content

Questions regarding Gemini 2.5 pro results on leaderboard #128

@xinqianpku

Description

@xinqianpku

Hi folks,

I'm trying to reproduce the Gemini 2.5 pro's results locally, and want to clarify on a few things

  1. What's the difference between Gemini-2.5-Pro-06-05 and Gemini-2.5-Pro-05-06? The pass@1 differed a lot (74.2 vs 68.1 on 01/01/2025-05/01/2025).
  2. Is there an explicit list of model hyperparams that are used per submission (top_p, temperature, max_tokens, etc.)? I could only find top_p=0.95 and temperature=0.2 from the paper but not sure about the submission's setup.
  3. Could model evolvement during the pass 5 months (from May to now) impact the benchmark results (edited, noticed a bug for previous statement)
  4. In general, what's the best strategy to reproduce the leaderboard results on Gemini and other models?

Thanks in advance for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions