-
Notifications
You must be signed in to change notification settings - Fork 1.6k
add LinearWarmupScheduler #1537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add LinearWarmupScheduler #1537
Conversation
| return self.value_at_epoch[old_index], self.value_at_epoch[index] | ||
|
|
||
|
|
||
| class LinearWarmupScheduler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HI and thanks! Is this resumable ? I see a "current_step" shouldn't the sheduler be saved as well in case of resuming experiment? This can be easily done with hooks (see other scheduler with states). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, on a side note, StepScheduler also does not have hooks. we should fix that in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this is a very good question. TBH, I am not very familiar with the concept of hooks. But I will take a look at how other schedulers are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the checkpoint hooks. Please take a look at it.
|
Huge thanks! |
|
I notice the design is quite different from the one in PyTorch native schedulers that have a step() function and have load_state_dict() and state_dict() functions. We also ended up changing the interface a bit, as I wanted something where you could step on both minibatches and epochs. [In our case it's not part of a unified interface, though; because as for now, for flexibility of early development, our model is to put most of the complexity in local scripts without putting most things in any central place.] |
|
Agreed. Torch schedulers are a bit rigid. Although, you can use it natively with SB as well. As you may have seen, we follow the opposite direction for now: more central places and less complexity in local scripts. I guess it's a balance to do between how much maintenance you can put from a coordinated team (central) vs how much you wish to rely on the community to do that (local scripts). At least, this is a personal opinion, I find it hard to maintain properly recipes as they tend to grow way too rapidly in number :p |
|
Hm yes, for now we are aiming to get the best possible WER with reasonable latency before we add lots of recipes; at a later time we might consider centralizing things a bit. I figure if people really need recipes that work for a specific dataset, they can always get it from speechbrain or ESPNet. |
|
The numbers you get with Transducers are really impressive, I really wish we soon obtain enough resources to put someone on this full-time. The last intern that tried did not succeed but he had other things to do as well (the PR where he tried your nice pruned transducer loss). |
Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.