Conformer Transducer Librispeech (Contribution from Samsung AI Cambridge) #1782

TParcollet · 2023-01-03T15:30:58Z

This PR refactors our transducer recipe for Librispeech. In practice, we will drop the current CRDNN that hasn't even been training on the full set with a better-performing conformer_transducer.

Develop a first working multitask (CTC only for now) conformer transducer. This is with RNNLM. @TParcollet
Refactor the recipe to be simpler. @TParcollet
Go below 3% of WER. Now at 2.8% with BS on test-clean with RNNLM.

Todo in another PR

Refactor entirely transducer BS to match our new interface @Adel-Moumen.
Turn this recipe into something that works with TransformerLM and match ESPnet 2.4%.
Transpose all this to all transducer recipes (CommonVoice I believe).

…into libri_conformer_large

asumagic

Made some general remarks and pointed out at one issue in the hparams. Other than that, LGTM, and I will be retraining a model shortly.

recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml

recipes/LibriSpeech/ASR/transducer/train.py

Adel-Moumen · 2023-07-11T12:07:59Z

Hey @TParcollet, any news please? Thanks. :-)

asumagic · 2023-07-18T13:11:53Z

Taking the liberty of pushing my current code (without streaming stuff, only the stuff that gets me to 2.8) but bear in mind I'm currently trying to fight git for a bit.

mravanelli · 2023-07-18T13:26:10Z

Sure, ping me wheb you are ready for a review.

…

On Tue, Jul 18, 2023, 9:12 AM asu ***@***.***> wrote: Taking the liberty of pushing my current code (without streaming stuff, only the stuff that gets me to 2.8) but bear in mind I'm currently trying to fight git for a bit. — Reply to this email directly, view it on GitHub <#1782 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEA2ZVWHLTL7RHY3DOGXMRLXQ2DSVANCNFSM6AAAAAATPZ2MUE> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Mostly fixed

asumagic · 2023-07-18T15:24:25Z

I have yet to upload the checkpoint + edit the README with new results, I will do this tomorrow. The code side of things is ready if you prefer not to delay reviewing for it. With my changes, using gradient averaging, we reach a greedy-search test-clean WER of 2.84% as roughly was the goal.

There is one issue I had to work around: In TransformerASR, custom_tgt_module is initialized regardless of if transformer decoding is used at all, which is not the case here.
This is a problem, because it broke DDP for me as it discovered an unused parameter. It also adds a useless parameter to trained model checkpoints (~0.5M).
The Transformer class (which TransformerASR extends) properly gates that initialization behind a if num_decoder_layers > 0: check. The problem is that simply adding this check will not cut it as it will break every existing trained model, since trained models will include keys that will not exist (as the custom_tgt_module will not have been initialized).
I am not sure what the best approach to fix the issue is, so for now as to avoid delaying things, I reintroduced the useless parameter to my checkpoint after finishing training so that this code can be left unchanged.

mravanelli · 2023-07-18T22:37:41Z

@TParcollet, @asumagic thank you for this PR!

I did some tests and everything LGTM. I ran the recipe in debug mode and it looks like the code is runnable end-to-end. I couldn't test the DDP part as I could do my test on a single GPU (with reduced batch size).

Regarding DDP, here are my questions:

@TParcollet, did you observe the problem mentioned by @asumagic ?
@asumagic, do you see a way to fix this issue in a backward-compatible way? Maybe we can write a custom load_state_dict that loads the optional parameters only if they are available in the checkpoint.

TParcollet · 2023-07-18T22:56:45Z

@asumagic what about gating the custom_tgt_module as well if num_decoder_layers == 0?

asumagic · 2023-07-19T12:20:45Z

@asumagic what about gating the custom_tgt_module as well if num_decoder_layers == 0?

What do you mean precisely? Only instantiating the custom_tgt_module if num_decoder_layers > 0 is indeed possible but it will break existing trained models that have no decoder layers without further changes. Quite a few recipes seem to use num_decoder_layers: 0.

Maybe we can write a custom load_state_dict that loads the optional parameters only if they are available in the checkpoint.

Possibly, but I am not sure what is the best way to do it. If the caller code (i.e. not TransformerASR itself) has to be modified then it would be a breaking change. Ideally, the workaround would be at TransformerASR level... But I don't know if that's possible.

asumagic · 2023-07-19T16:04:09Z

Updated README with all results + uploaded the checkpoint files. It took me a bit because I had to rerun evaluation.

The current checkpoint includes the useless custom_tgt_module at the moment.

asumagic · 2023-07-19T19:08:25Z

Since the Dropbox PR was merged, this now conflicts. Also I forgot that the LibriSpeech.csv recipe file should be updated with a link.

But let's first decide what we should do regarding the custom_tgt_module as a checkpoint reupload may be needed. The current checkpoint contains the unnecessary parameter (this will still work, and is a workaround for now).

Other than that, it should be all good AFAIK.

TParcollet · 2023-07-19T21:25:18Z

If you can think of a quick and easy way of fixing the incompatibility that will arise with existing checkpoints and the removal of custom tgt, then we fix it. Otherwise we move forward

asumagic · 2023-07-20T08:19:40Z

In that case IMO we can move forward. I fixed minor conflicts against develop (from the Dropbox PR) and updated the README + recipe CSV. There shouldn't be functional changes since @mravanelli 's check.

mravanelli · 2023-07-20T16:05:01Z

Thank you @asumagic,

It looks like the log folder is still in gdrive, while we need to switch to Dropbox. Do you want me to upload to the speechbrain dropbox and change the link?
It looks like the DDP issue is important, but might be fixed in a new PR (maybe for the new major version of SpeechBrain). What do you suggest to do @TParcollet?

asumagic · 2023-07-20T16:22:05Z

It looks like the log folder is still in gdrive, while we need to switch to Dropbox. Do you want me to upload to the speechbrain dropbox and change the link?

Sure @mravanelli !

TParcollet added 30 commits December 26, 2022 18:11

try larger

800b38f

new conformer trans

55f8e8d

new conformer trans

738083d

new conformer trans

f72227a

new conformer trans

4828969

new conformer trans

d051e63

new conformer trans

60b5ba5

new conformer trans

fa89c39

new conformer trans

3be3094

new conformer trans

5869a58

new conformer trans

56dea59

new conformer trans

79800f7

new conformer trans

adc55d4

new conformer trans

e8b5789

new conformer trans

761f2a3

new conformer trans

c29dbdc

new conformer trans

5e7be49

new conformer trans

c092608

new conformer trans

713230c

new conformer trans

529a0b1

new conformer trans

d111d4c

new conformer trans

b628591

add conformer transducer

8ade147

add conformer transducer

6901e6b

add conformer transducer

555603f

update conformer large

902acdd

large conf transducer

eb28e4c

large conf transducer

7070906

conformer large

65eccfc

weight decay

b44fce0

Titouan Parcollet added 3 commits June 30, 2023 16:49

Merge branch 'develop' of https://github.com/speechbrain/speechbrain …

2c75924

…into libri_conformer_large

add testing

a6d5d54

push new results

6cb2b25

asumagic previously requested changes Jul 3, 2023

View reviewed changes

recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml Outdated Show resolved Hide resolved

recipes/LibriSpeech/ASR/transducer/hparams/conformer_transducer.yaml Show resolved Hide resolved

recipes/LibriSpeech/ASR/transducer/train.py Outdated Show resolved Hide resolved

Backport model changes from streaming branch

167b976

asumagic force-pushed the libri_conformer_large branch from 82d72c8 to 167b976 Compare July 18, 2023 14:23

asumagic added 2 commits July 18, 2023 16:36

Add note for DDP/grad accumulation

a2b137c

Precommit fix

72afffe

Updated results + model link

5969765

asumagic added 3 commits July 20, 2023 09:59

Update README

3287c4d

Merge branch 'develop' into libri_conformer_large

5b9f48b

Fix typo

7f39d12

mravanelli self-requested a review July 20, 2023 18:35

mravanelli approved these changes Jul 20, 2023

View reviewed changes

mravanelli merged commit e36743f into speechbrain:develop Jul 20, 2023

asumagic mentioned this pull request Nov 30, 2023

Streamable Conformer-Transducer ASR model for LibriSpeech #2140

Merged

26 tasks

Conformer Transducer Librispeech (Contribution from Samsung AI Cambridge) #1782

Conformer Transducer Librispeech (Contribution from Samsung AI Cambridge) #1782

Uh oh!

Conversation

TParcollet commented Jan 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asumagic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adel-Moumen commented Jul 11, 2023

Uh oh!

asumagic commented Jul 18, 2023

Uh oh!

mravanelli commented Jul 18, 2023 via email

Uh oh!

asumagic commented Jul 18, 2023

Uh oh!

mravanelli commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet commented Jul 18, 2023

Uh oh!

asumagic commented Jul 19, 2023

Uh oh!

asumagic commented Jul 19, 2023

Uh oh!

asumagic commented Jul 19, 2023

Uh oh!

TParcollet commented Jul 19, 2023

Uh oh!

asumagic commented Jul 20, 2023

Uh oh!

mravanelli commented Jul 20, 2023

Uh oh!

asumagic commented Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TParcollet commented Jan 3, 2023 •

edited

Loading

mravanelli commented Jul 18, 2023 •

edited

Loading