wav2vec2 pretraining implemented with speechbrain #1312

RuABraun · 2022-02-25T19:45:02Z

(this is not meant for merge yet but to share the code)

This allows one to pretrain a wav2vec2 model without relying on fairseq or huggingface. It follows the fairseq implementation though there are various differences.

Here is a plot showing WER after finetuning vs different number of pretraining steps and comparing this implementation to fairseq (as well as without and with the quantisation). This is on italian commonvoice 7.0, using validated as the pretraining set, train for finetuning

I made a wrapper around the whole w2v2 object to make it easy to hold connecting objects like the masking tensor and various projection layers. The feature extractor and encoder are arguments that can be overridden. Using vector quantisation is toggle-able. The implementation follows the most recent fairseq implementation (but not the very recent conformer stuff) so uses normalise_before=True and layer norm is used instead of group norm.

One can use the existing finetuning script inside /ASR/CTC to train the pretrained model.

Minor notes and TODOs:

The queue at my institute favors short jobs so I added a run_opts option "train_time_hours" so I can stop training after x hours (I have another wrapper that then takes care of resubmitting)
Overloaded fit() for logging
Overloaded update_average() because original relies on self.step which is reset every epoch plus I don't get the formula why not just use EMA
Should I actually put the recipe in CommonVoice/self-supervised-learning ? Think some classes could maybe be moved to a better location
Generally needs more docstrings etc.
Made a change to make logging output more informative
I use wandb so there is an option to use that, should change that later to use existing wandb-logger (I saw it but skipped using that for now)

RuABraun · 2022-04-05T08:54:58Z

Latest commit is result of merging mine and guille's implementations. Best to ignore commonvoice files as those are not up-to-date and I'd like to focus on getting the librespeech implementation done before we update commonvoice.

TParcollet

Thanks for the huge and awesome work ! Please find my major comments on this review :-)

recipes/LibriSpeech/SSL/hparams/train_wav2vec.yaml

recipes/LibriSpeech/SSL/train.py

recipes/LibriSpeech/SSL/hparams/train_wav2vec.yaml

speechbrain/lobes/models/wav2vec.py

speechbrain/nnet/CNN.py

speechbrain/utils/distributed.py

speechbrain/utils/logger.py

recipes/LibriSpeech/SSL/train.py

recipes/LibriSpeech/librispeech_prepare.py

recipes/LibriSpeech/SSL/hparams/train_wav2vec.yaml

speechbrain/lobes/models/wav2vec.py

RuABraun · 2022-05-23T15:59:57Z

I've refactored a lot, cleaned things up, added more docstrings, think it's worth going through another review.

Some things I still plan to do:

I think the W2VLatentEncoder object is unnecessary, plan to remove and instead just use TransformerEncoder directly
not 100% sure the changes I made to DynamicBatchSampler are necessary, need to look into it more
remove the train_time_hours option I added
remove the disable_halfprec option being used in compute_forward(), have a build running to check the impact
probably should add more info to the README

TParcollet

Thanks again @RuABraun for this amazing job! Here is my review. I think after that, I can start a big training and see. We are close to something.

recipes/LibriSpeech/SSL/hparams/train_wav2vec.yaml

recipes/LibriSpeech/SSL/librispeech_prepare.py

recipes/LibriSpeech/SSL/train.py

speechbrain/lobes/models/transformer/Transformer.py

speechbrain/lobes/models/wav2vec.py

speechbrain/nnet/losses.py

speechbrain/nnet/quantisers.py

speechbrain/utils/distributed.py

TParcollet · 2022-06-03T13:15:27Z

Also please, resolve the conflict so that I can run the tests :p

RuABraun · 2022-06-06T17:44:36Z

There are definitely some changes needed for core.py (for example no_sync()), did I understand you correctly that you would prefer for these to be in a different PR? @TParcollet

TParcollet · 2022-06-24T20:04:25Z

Hi @RuABraun what are the next steps to follow in your view?

RuABraun · 2022-06-25T13:18:44Z

There's the PR #1449 that needs to be merged first.

I recently figured out one thing that was causing bad performance, but there's still one more issue remaining (get significantly better performance training with 2 GPUs vs with 4). Hope to figure that out in the next ~2 weeks (was busy with interviews in the last ~6 weeks but thankfully that's over now).

Got a few interesting results from trying out different things, will post about them in the slack soon.

RuABraun · 2022-07-26T17:58:10Z

waiting on #1518

TParcollet · 2022-09-22T09:06:08Z

What an amazing PR! Thank you so much @RuABraun and @gcambara! We are finally done!

RuABraun · 2022-09-23T11:33:32Z

@TParcollet Noting two things down here to look at in the future for better performance:

No weight decay on biases and layernorm parameters
Finetuning on our model just modifies the transformer encoder, the huggingface model finetuning seems to also adjust the convolutional model.

TParcollet self-requested a review February 26, 2022 08:28

TParcollet added the work in progress Not ready for merge label Feb 26, 2022

RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from f93e06d to ed3c983 Compare April 3, 2022 18:41

TParcollet requested changes Apr 5, 2022

View reviewed changes

gcambara reviewed Apr 5, 2022

View reviewed changes

recipes/LibriSpeech/SSL/train.py Outdated Show resolved Hide resolved

gcambara reviewed Apr 5, 2022

View reviewed changes

recipes/LibriSpeech/librispeech_prepare.py Show resolved Hide resolved

gcambara reviewed Apr 5, 2022

View reviewed changes

recipes/LibriSpeech/SSL/hparams/train_wav2vec.yaml Outdated Show resolved Hide resolved

gcambara reviewed Apr 5, 2022

View reviewed changes

speechbrain/lobes/models/wav2vec.py Outdated Show resolved Hide resolved

speechbrain/lobes/models/wav2vec.py Outdated Show resolved Hide resolved

speechbrain/lobes/models/wav2vec.py Outdated Show resolved Hide resolved

RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from 117e6e9 to 4caef3c Compare April 11, 2022 16:16

RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from 5d9c6cf to e4d33d6 Compare May 23, 2022 15:49

mravanelli force-pushed the develop branch from 25d399a to 421fb46 Compare May 31, 2022 16:58

anautsch changed the base branch from develop to develop-v2 June 1, 2022 15:40

TParcollet requested changes Jun 3, 2022

View reviewed changes

RuABraun changed the title ~~(WIP) wav2vec2 pretraining implemented with speechbrain~~ wav2vec2 pretraining implemented with speechbrain Jul 25, 2022

RuABraun force-pushed the feature/wav2vec-pretrain branch 3 times, most recently from 019733b to b1d5c15 Compare July 25, 2022 20:57

Rudolf A Braun added 3 commits July 27, 2022 14:36

Working version of wav2vec pretraining

6944a15

[wav2vec2] Refactoring, many cosmetic changes

666f8a3

[wav2vec] adding hparam file for LARGE model

cbbeb53

TParcollet added 26 commits September 10, 2022 20:16

trying to understand acc weirdness

38b676f

back kernel

6f0f164

use pretrainer

a023fdd

remove augmentation

4d3e819

good dim

df5572f

switch to vanilla NN for debug

36bc148

dnn_act

e718a6f

shuffling options

e46ce4e

device

86d6816

options

ce23b98

back to sequential encoder

32f1124

removing bn

88c475b

remove drop

c694541

no act

fc31ada

activation?

2e25e60

try vanilla NN as well

d5e10f2

vanilla name

f5dfa11

removing vanilla

f354807

timedomain augment for dowstream

a7fb4a7

Merge branch 'develop' into feature/wav2vec-pretrain

eab27b5

Readme updates

9bdb6de

dnn dropout fix

99d5d8a

fix CNN and add interface ...

9d38af1

wavlen

d7d0b94

embeddings for the interface

efe1e95

update with links!

53ae80c

TParcollet merged commit c075b41 into speechbrain:develop Sep 22, 2022

BenoitWang mentioned this pull request Oct 4, 2022

fix LibriSpeech CTC pretrainer #1594

Merged

wav2vec2 pretraining implemented with speechbrain #1312

wav2vec2 pretraining implemented with speechbrain #1312

Uh oh!

Conversation

RuABraun commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RuABraun commented Apr 5, 2022

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RuABraun commented May 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TParcollet commented Jun 3, 2022

Uh oh!

RuABraun commented Jun 6, 2022

Uh oh!

TParcollet commented Jun 24, 2022

Uh oh!

RuABraun commented Jun 25, 2022

Uh oh!

RuABraun commented Jul 26, 2022

Uh oh!

TParcollet commented Sep 22, 2022

Uh oh!

RuABraun commented Sep 23, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RuABraun commented Feb 25, 2022 •

edited

Loading

RuABraun commented May 23, 2022 •

edited

Loading