Skip to content

Conversation

@TParcollet
Copy link
Collaborator

@TParcollet TParcollet commented Jan 3, 2023

This PR refactors our transducer recipe for Librispeech. In practice, we will drop the current CRDNN that hasn't even been training on the full set with a better-performing conformer_transducer.

  • Develop a first working multitask (CTC only for now) conformer transducer. This is with RNNLM. @TParcollet
  • Refactor the recipe to be simpler. @TParcollet
  • Go below 3% of WER. Now at 2.8% with BS on test-clean with RNNLM.

Todo in another PR

  • Refactor entirely transducer BS to match our new interface @Adel-Moumen.
  • Turn this recipe into something that works with TransformerLM and match ESPnet 2.4%.
  • Transpose all this to all transducer recipes (CommonVoice I believe).

Copy link
Collaborator

@asumagic asumagic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some general remarks and pointed out at one issue in the hparams. Other than that, LGTM, and I will be retraining a model shortly.

@Adel-Moumen
Copy link
Collaborator

Hey @TParcollet, any news please? Thanks. :-)

@asumagic
Copy link
Collaborator

Taking the liberty of pushing my current code (without streaming stuff, only the stuff that gets me to 2.8) but bear in mind I'm currently trying to fight git for a bit.

@mravanelli
Copy link
Collaborator

mravanelli commented Jul 18, 2023 via email

@asumagic asumagic force-pushed the libri_conformer_large branch from 82d72c8 to 167b976 Compare July 18, 2023 14:23
@asumagic asumagic dismissed their stale review July 18, 2023 15:10

Mostly fixed

@asumagic
Copy link
Collaborator

I have yet to upload the checkpoint + edit the README with new results, I will do this tomorrow. The code side of things is ready if you prefer not to delay reviewing for it. With my changes, using gradient averaging, we reach a greedy-search test-clean WER of 2.84% as roughly was the goal.

There is one issue I had to work around: In TransformerASR, custom_tgt_module is initialized regardless of if transformer decoding is used at all, which is not the case here.
This is a problem, because it broke DDP for me as it discovered an unused parameter. It also adds a useless parameter to trained model checkpoints (~0.5M).
The Transformer class (which TransformerASR extends) properly gates that initialization behind a if num_decoder_layers > 0: check. The problem is that simply adding this check will not cut it as it will break every existing trained model, since trained models will include keys that will not exist (as the custom_tgt_module will not have been initialized).
I am not sure what the best approach to fix the issue is, so for now as to avoid delaying things, I reintroduced the useless parameter to my checkpoint after finishing training so that this code can be left unchanged.

@mravanelli
Copy link
Collaborator

mravanelli commented Jul 18, 2023

@TParcollet, @asumagic thank you for this PR!

  • I did some tests and everything LGTM. I ran the recipe in debug mode and it looks like the code is runnable end-to-end. I couldn't test the DDP part as I could do my test on a single GPU (with reduced batch size).

Regarding DDP, here are my questions:

  • @TParcollet, did you observe the problem mentioned by @asumagic ?
  • @asumagic, do you see a way to fix this issue in a backward-compatible way? Maybe we can write a custom load_state_dict that loads the optional parameters only if they are available in the checkpoint.

@TParcollet
Copy link
Collaborator Author

@asumagic what about gating the custom_tgt_module as well if num_decoder_layers == 0?

@asumagic
Copy link
Collaborator

@asumagic what about gating the custom_tgt_module as well if num_decoder_layers == 0?

What do you mean precisely? Only instantiating the custom_tgt_module if num_decoder_layers > 0 is indeed possible but it will break existing trained models that have no decoder layers without further changes. Quite a few recipes seem to use num_decoder_layers: 0.

Maybe we can write a custom load_state_dict that loads the optional parameters only if they are available in the checkpoint.

Possibly, but I am not sure what is the best way to do it. If the caller code (i.e. not TransformerASR itself) has to be modified then it would be a breaking change. Ideally, the workaround would be at TransformerASR level... But I don't know if that's possible.

@asumagic
Copy link
Collaborator

Updated README with all results + uploaded the checkpoint files. It took me a bit because I had to rerun evaluation.

The current checkpoint includes the useless custom_tgt_module at the moment.

@asumagic
Copy link
Collaborator

Since the Dropbox PR was merged, this now conflicts. Also I forgot that the LibriSpeech.csv recipe file should be updated with a link.

But let's first decide what we should do regarding the custom_tgt_module as a checkpoint reupload may be needed. The current checkpoint contains the unnecessary parameter (this will still work, and is a workaround for now).

Other than that, it should be all good AFAIK.

@TParcollet
Copy link
Collaborator Author

If you can think of a quick and easy way of fixing the incompatibility that will arise with existing checkpoints and the removal of custom tgt, then we fix it. Otherwise we move forward

@asumagic
Copy link
Collaborator

In that case IMO we can move forward. I fixed minor conflicts against develop (from the Dropbox PR) and updated the README + recipe CSV. There shouldn't be functional changes since @mravanelli 's check.

@mravanelli
Copy link
Collaborator

Thank you @asumagic,

  • It looks like the log folder is still in gdrive, while we need to switch to Dropbox. Do you want me to upload to the speechbrain dropbox and change the link?
  • It looks like the DDP issue is important, but might be fixed in a new PR (maybe for the new major version of SpeechBrain). What do you suggest to do @TParcollet?

@asumagic
Copy link
Collaborator

It looks like the log folder is still in gdrive, while we need to switch to Dropbox. Do you want me to upload to the speechbrain dropbox and change the link?

Sure @mravanelli !

@mravanelli mravanelli self-requested a review July 20, 2023 18:35
@mravanelli mravanelli merged commit e36743f into speechbrain:develop Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

important ready to review Waiting on reviewer to provide feedback refactor Edit code without changing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants