-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Avoid loading checkpoint parameters on the target device #1743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid loading checkpoint parameters on the target device #1743
Conversation
When 'map_location=device', the checkpoint parameters are loaded on the CPU first (see docs of torch.load) and then moved to the target device. This means that, before copying the checkpoint parameters into the model, we have 2 full independent copies of the model parameters on the target device (first copy are the parameters in the model, second copy are the parameters being recovered from the checkpoint). If the model is on the CPU, we cannot avoid the waste of memory. However, if the model is on a device different from CPU (e.g. "cuda"), we can avoid moving the loaded checkpoint parameters on the device and hence avoid wasting the device memory and potentially having an out-of-memory error when dealing with huge models. Since 'obj.load_state_dict' copies the loaded parameters into the model/optimizer/scheduler in-place one by one, it automatically takes care of moving them to the model's device, even if they are on the CPU.
|
@Gastron could you please take a look? |
|
This is a good idea, I think we should go through with this change. Like noted, for example torch.load documentation suggests this. However, the speechbrain/speechbrain/core.py Line 827 in 46be2d1
@lucadellalib would you be willing to go over the codebase and remove the unnecessary |
|
@Gastron I think it would be better to leave it as it is to minimize the number of changes (easier to debug in case of problems, lower risk of breaking other components, etc.) and especially for backward compatibility. Some users might be using those functions and suddenly find their code not working anymore because they are passing an unexpected argument - |
|
What about adding the change in the unstable branch? This way it will be
available for the next major version (where some interface changes are
planned).
…On Thu, Dec 29, 2022, 8:12 PM Luca Della Libera ***@***.***> wrote:
@Gastron <https://github.com/Gastron> I think it would be better to leave
it as it is to minimize the number of changes (easier to debug in case of
problems, lower risk of breaking other components, etc.) and especially for
backward compatibility. Some users might be using those functions and
suddenly find their code not working anymore because they are passing an
unexpected argument - device.
—
Reply to this email directly, view it on GitHub
<#1743 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVRSZUO2DDN5K34GRGLWPYZHPANCNFSM6AAAAAASTRIUIQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
I think it would be better to take the whole step of removing the device argument. However, I agree with Mirco that this is then a breaking change, meaning we should only add it during the next major version. Of course we understand if @lucadellalib doesn't want to take the time to remove the argument from so many places, or how do you feel Luca? Perhaps someone else should contribute that part? |
|
Hi Luca,
feel free to move on with it and ask a PR on the "unstable" branch (the one
for the new version of speechbrain).
Note that we will soon merge the recipe testing PR that allows us to test
every single recipe.
Best,
Mirco
…On Thu, Jan 5, 2023 at 6:37 AM Aku Rouhe ***@***.***> wrote:
I think it would be better to take the whole step of removing the device
argument. However, I agree with Mirco that this is then a breaking change,
meaning we should only add it during the next major version.
Of course we understand if @lucadellalib <https://github.com/lucadellalib>
doesn't want to take the time to remove the argument from so many places,
or how do you feel Luca? Perhaps someone else should contribute that part?
—
Reply to this email directly, view it on GitHub
<#1743 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA2ZVXZNRHD63IFMGRFZJDWQ2XBJANCNFSM6AAAAAASTRIUIQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
@Gastron @mravanelli I removed the unnecessary |
|
Great work! I browsed all the changed, looks good to me. This touches a lot of recipes, is it ok to merge this now, before the recipe testing PR @anautsch ? |
|
lgtm! @Gastron we'll see what breaks & fix it - but this PR's changes seem complementary. Thank you, @lucadellalib ! |
When 'map_location=device', the checkpoint parameters are loaded on the CPU first (see docs of torch.load) and then moved to the target device. This means that, before copying the checkpoint parameters into the model, we have 2 full independent copies of the model parameters on the target device (first copy are the parameters in the model, second copy are the parameters being recovered from the checkpoint). If the model is on the CPU, we cannot avoid the waste of memory. However, if the model is on a device different from CPU (e.g. "cuda"), we can avoid moving the loaded checkpoint parameters on the device and hence avoid wasting the device memory and potentially having an out-of-memory error when dealing with huge models. Since 'obj.load_state_dict' copies the loaded parameters into the model/optimizer/scheduler in-place one by one, it automatically takes care of moving them to the model's device, even if they are on the CPU.