Skip to content

Conversation

@asumagic
Copy link
Collaborator

@asumagic asumagic commented Feb 24, 2023

Closes #1860. The changes to the document are quite in-depth and there is new added info, so this PR needs to be fact-checked aside of proofreading.

The resulting compiled document looks as intended.

Overview of the changes

  • The document was generally rewritten to be better laid out and is significantly more detailed.
  • DP is explicitly recommended against and is noted as deprecated, which seems to be the general stance of SB core contributors. (I made it go as far as saying SB does not support it anymore, considering we generally always tell DP users to use DDP instead when they face DP-related bugs)
  • I tried to make the phrasing much more explicit in what DDP enables and how multi-GPU training works.
  • "Writing DDP-safe code" gets its own, more detailed subsection.

Additional notes

The Colab tutorial (which was brought up in the issue here) is not linked in the document as of this PR. It should probably be updated to the contents of this PR, but as unless it actually gets any different content from the documentation, it probably doesn't need to be linked from the docs.

@anautsch (in DMs) mentioned DDP tests and some changes from #1817 that indirectly affect DDP. This PR mentions neither at the time, but perhaps someone more qualified than me could enhance the DDP-safety subsection further.

@asumagic
Copy link
Collaborator Author

Ugh, pre-commit checks fail to intended trailing whitespaces in the markdown. I'll remove them for now.

This made the pre-commit hooks fail, even though double trailing spaces were intended for line returns without starting a new paragraph.
@Adel-Moumen
Copy link
Collaborator

Thanks @asumagic!

@asumagic
Copy link
Collaborator Author

I made the suggested changes and added details on what the parameters to torch.distributed.launch mean.

Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I really like your changes. I think it will help a lot of ppl! :-)

@Adel-Moumen Adel-Moumen merged commit e8cd583 into speechbrain:develop Feb 24, 2023
@asumagic asumagic deleted the doc-multigpu-rewrite branch February 24, 2023 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: multi-GPU docs should be improved/updated

2 participants