Skip to content

Conversation

@Adel-Moumen
Copy link
Collaborator

@Adel-Moumen Adel-Moumen commented Aug 30, 2024

What does this PR do?

This PR add the support of a new function called seed_everything which tries to maximize reproducibility.

When using two processes it does print:

INFO:speechbrain.utils.seed:[rank: 1] Setting seed to 3403
INFO:speechbrain.utils.seed:[rank: 0] Setting seed to 3402

While for one:

INFO:speechbrain.utils.seed:[rank: 0] Setting seed to 3402

As a matters of fact, we offset the seed by the rank of the current process.

Before submitting
  • Did you read the contributor guideline?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Does your code adhere to project-specific code style and conventions?

PR review

Reviewer checklist
  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Confirm that the changes adhere to compatibility requirements (e.g., Python version, platform)
  • Review the self-review checklist to ensure the code is ready for review

@Adel-Moumen Adel-Moumen marked this pull request as ready for review August 30, 2024 12:13
@Adel-Moumen Adel-Moumen requested a review from asumagic August 30, 2024 12:13
@Adel-Moumen Adel-Moumen self-assigned this Aug 30, 2024
@Adel-Moumen Adel-Moumen requested a review from TParcollet August 30, 2024 12:13
@Adel-Moumen Adel-Moumen added this to the v1.0.2 milestone Aug 30, 2024
Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good PR that better follows PyTorch recommendations about randomness. I have a few minor comments but it is good enough that it could be merged now.


However, due to the differences in how GPU and CPU executions work, results may not be fully reproducible even with identical seeds, especially when training models. This issue primarily affects training experiments.

On the other hand, when preparing data using data preparation scripts, the output of these scripts is independent of the global seeds. This ensures that you will get identical outputs on different setups, even if different seeds are used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could expand this to explain important details about distributed experiments, that different seeds will be set on different machines which will affect things like augmentations, but not things like initial model parameters or data loaders.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Let me know what you think

Copy link
Collaborator

@pplantinga pplantinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pplantinga pplantinga merged commit eb13b9e into speechbrain:develop Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants