DDP wrong command in doc #1498

Adel-Moumen · 2022-07-06T11:13:28Z

Hello,

In the file distributed.py, some errors can be raised suggesting the use of --distributed_launch=True which is misleading because it is not what the user is supposed to do. According to the official documentation of SpeechBrain, if one wants to use DDP then he needs to use --distributed_launch instead of --distributed_launch=True -> https://speechbrain.readthedocs.io/en/v0.5.8/multigpu.html

Moreover, --distributed_launch=True will not work. I tried to do a little example on my GPUs cluster where I'm training an LSTM on the CommonVoice French recipe of SpeechBrain using 2x v100s:

with --distributed_launch=True I got an error: torch.distributed.elastic.multiprocessing.errors.ChildFailedError ......
whearas --distributed_launch do everything that is expected, i.e train the LSTM.

I got the same return from a colleague who tried to use DDP and was stuck due to the errors thrown that were suggesting to set to True.

P.S.: I also need to change every comment in the recipes that suggests that. E.g. CommonVoice ASR seq2seq in train.py -> # If distributed_launch=True but I'm waiting for an official reviewer to confirm the problem.

Best,

TParcollet · 2022-07-20T08:24:11Z

I am aware of that :-( I will fix the doc, but for the recipes ... well ...

TParcollet

Approved.

fix ddp incorrect command

1a16b41

mravanelli requested a review from TParcollet July 19, 2022 13:52

TParcollet approved these changes Jul 20, 2022

View reviewed changes

TParcollet merged commit e961fb4 into speechbrain:develop Jul 20, 2022

Adel-Moumen mentioned this pull request Jul 21, 2022

Fix transducer loss + --distributed_launch=True in recipes #1511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP wrong command in doc #1498

DDP wrong command in doc #1498

Uh oh!

Adel-Moumen commented Jul 6, 2022

Uh oh!

TParcollet commented Jul 20, 2022

Uh oh!

TParcollet left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DDP wrong command in doc #1498

DDP wrong command in doc #1498

Uh oh!

Conversation

Adel-Moumen commented Jul 6, 2022

Uh oh!

TParcollet commented Jul 20, 2022

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants