-
Notifications
You must be signed in to change notification settings - Fork 157
Open
Description
Hi folks,
I'm trying to reproduce the Gemini 2.5 pro's results locally, and want to clarify on a few things
- What's the difference between
Gemini-2.5-Pro-06-05andGemini-2.5-Pro-05-06? The pass@1 differed a lot (74.2 vs 68.1 on 01/01/2025-05/01/2025). - Is there an explicit list of model hyperparams that are used per submission (top_p, temperature, max_tokens, etc.)? I could only find top_p=0.95 and temperature=0.2 from the paper but not sure about the submission's setup.
- Could model evolvement during the pass 5 months (from May to now) impact the benchmark results (edited, noticed a bug for previous statement)
- In general, what's the best strategy to reproduce the leaderboard results on Gemini and other models?
Thanks in advance for your help!
Metadata
Metadata
Assignees
Labels
No labels