-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[FIX] Flush gradients and save memory for validation. #1739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I also spotted that gradients are not accumulated correctly, so I fixed that too. |
|
Thank you for this PR @MartinKocour. I think it can be useful. One thing I'm not sure about is why calling the function you defined ( |
|
Mirco @mravanelli, in recipes I am not deleting anything. My You are right, I don't have to call In |
pplantinga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice change! I especially appreciate saving space for evaluation and fixing the gradient accumulation bug. Quick question, have you verified that this does in fact save space for evaluation and there's nothing else (e.g. manual garbage collection) that is needed?
This reverts commit 30e8def.
|
Ready to be merged |
pplantinga
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM
When
fit_trainis done we should call optimizer.zero_grad(set_to_none=True) , which should force to clear the gradients from memory. In the current version of the code, theoptimizer.zero_grad()is the last call, which however store zeros in memory.This minor change should allow to use bigger batch during
_fit_validorevaluateand thus speed-it up. This is especially useful, when you use large models like SepFormer with significant memory footprint.More details can be found in this PyTorch Tuning Guide.