Skip to content

Conversation

@MartinKocour
Copy link
Collaborator

When fit_train is done we should call optimizer.zero_grad(set_to_none=True) , which should force to clear the gradients from memory. In the current version of the code, the optimizer.zero_grad() is the last call, which however store zeros in memory.
This minor change should allow to use bigger batch during _fit_valid or evaluate and thus speed-it up. This is especially useful, when you use large models like SepFormer with significant memory footprint.

More details can be found in this PyTorch Tuning Guide.

@MartinKocour
Copy link
Collaborator Author

I also spotted that gradients are not accumulated correctly, so I fixed that too.

@mravanelli
Copy link
Collaborator

Thank you for this PR @MartinKocour. I think it can be useful. One thing I'm not sure about is why calling the function you defined (zero_grad) in fit_batch as well. As you can see, in core.py you still call the original function self.optimizer.zero_grad(), but you modified the recipes files such that they call zero_grad in their fit_batch . I'm not sure we need to delete the gradient buffers at the end of each batch (this might slow down?), but likely we really need to do it at the end of a training epoch only, right?

@MartinKocour
Copy link
Collaborator Author

MartinKocour commented Dec 4, 2022

Mirco @mravanelli, in recipes I am not deleting anything. My zero_grad(set_to_none=False) has the same behaviour like original torch.optimizer.zero_grad(), i.e. by default it does not delete the gradients from memory, but instead it just zeros the gradients when set_to_none=False (default).

You are right, I don't have to call self.zero_grad() in the recipes. I just didn't want to duplicate same lines of code. But it is not necessary, I can remove it. Although, the behaviour with self.zero_grad() is the same like with original code.

In core.py I did not use it since it is just single line of code + calling optimizer.zero_grad() is faster than calling Brain.zero_grad() since it contains additional if statement.

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change! I especially appreciate saving space for evaluation and fixing the gradient accumulation bug. Quick question, have you verified that this does in fact save space for evaluation and there's nothing else (e.g. manual garbage collection) that is needed?

@MartinKocour MartinKocour changed the title [WIP] Optimizer [FIX] Flush gradients and save memory for validation. Dec 7, 2022
@MartinKocour
Copy link
Collaborator Author

Ready to be merged

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

@anautsch anautsch merged commit d7ce222 into speechbrain:develop Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants