Skip to content

Conversation

@RuABraun
Copy link
Collaborator

@RuABraun RuABraun commented Feb 25, 2022

(this is not meant for merge yet but to share the code)

This allows one to pretrain a wav2vec2 model without relying on fairseq or huggingface. It follows the fairseq implementation though there are various differences.

Here is a plot showing WER after finetuning vs different number of pretraining steps and comparing this implementation to fairseq (as well as without and with the quantisation). This is on italian commonvoice 7.0, using validated as the pretraining set, train for finetuning

pretrain_wer_comparison

I made a wrapper around the whole w2v2 object to make it easy to hold connecting objects like the masking tensor and various projection layers. The feature extractor and encoder are arguments that can be overridden. Using vector quantisation is toggle-able. The implementation follows the most recent fairseq implementation (but not the very recent conformer stuff) so uses normalise_before=True and layer norm is used instead of group norm.

One can use the existing finetuning script inside /ASR/CTC to train the pretrained model.

Minor notes and TODOs:

  • The queue at my institute favors short jobs so I added a run_opts option "train_time_hours" so I can stop training after x hours (I have another wrapper that then takes care of resubmitting)
  • Overloaded fit() for logging
  • Overloaded update_average() because original relies on self.step which is reset every epoch plus I don't get the formula why not just use EMA
  • Should I actually put the recipe in CommonVoice/self-supervised-learning ? Think some classes could maybe be moved to a better location
  • Generally needs more docstrings etc.
  • Made a change to make logging output more informative
  • I use wandb so there is an option to use that, should change that later to use existing wandb-logger (I saw it but skipped using that for now)

@TParcollet TParcollet self-requested a review February 26, 2022 08:28
@TParcollet TParcollet added the work in progress Not ready for merge label Feb 26, 2022
@RuABraun RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from f93e06d to ed3c983 Compare April 3, 2022 18:41
@RuABraun
Copy link
Collaborator Author

RuABraun commented Apr 5, 2022

Latest commit is result of merging mine and guille's implementations. Best to ignore commonvoice files as those are not up-to-date and I'd like to focus on getting the librespeech implementation done before we update commonvoice.

Copy link
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the huge and awesome work ! Please find my major comments on this review :-)

@RuABraun RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from 117e6e9 to 4caef3c Compare April 11, 2022 16:16
@RuABraun RuABraun force-pushed the feature/wav2vec-pretrain branch 2 times, most recently from 5d9c6cf to e4d33d6 Compare May 23, 2022 15:49
@RuABraun
Copy link
Collaborator Author

RuABraun commented May 23, 2022

I've refactored a lot, cleaned things up, added more docstrings, think it's worth going through another review.

Some things I still plan to do:

  • I think the W2VLatentEncoder object is unnecessary, plan to remove and instead just use TransformerEncoder directly
  • not 100% sure the changes I made to DynamicBatchSampler are necessary, need to look into it more
  • remove the train_time_hours option I added
  • remove the disable_halfprec option being used in compute_forward(), have a build running to check the impact
  • probably should add more info to the README

@anautsch anautsch changed the base branch from develop to develop-v2 June 1, 2022 15:40
Copy link
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @RuABraun for this amazing job! Here is my review. I think after that, I can start a big training and see. We are close to something.

@TParcollet
Copy link
Collaborator

Also please, resolve the conflict so that I can run the tests :p

@RuABraun
Copy link
Collaborator Author

RuABraun commented Jun 6, 2022

There are definitely some changes needed for core.py (for example no_sync()), did I understand you correctly that you would prefer for these to be in a different PR? @TParcollet

@TParcollet
Copy link
Collaborator

Hi @RuABraun what are the next steps to follow in your view?

@RuABraun
Copy link
Collaborator Author

There's the PR #1449 that needs to be merged first.

I recently figured out one thing that was causing bad performance, but there's still one more issue remaining (get significantly better performance training with 2 GPUs vs with 4). Hope to figure that out in the next ~2 weeks (was busy with interviews in the last ~6 weeks but thankfully that's over now).

Got a few interesting results from trying out different things, will post about them in the slack soon.

@RuABraun RuABraun changed the title (WIP) wav2vec2 pretraining implemented with speechbrain wav2vec2 pretraining implemented with speechbrain Jul 25, 2022
@RuABraun RuABraun force-pushed the feature/wav2vec-pretrain branch 3 times, most recently from 019733b to b1d5c15 Compare July 25, 2022 20:57
@RuABraun
Copy link
Collaborator Author

waiting on #1518

@TParcollet
Copy link
Collaborator

What an amazing PR! Thank you so much @RuABraun and @gcambara! We are finally done!

@TParcollet TParcollet merged commit c075b41 into speechbrain:develop Sep 22, 2022
@RuABraun
Copy link
Collaborator Author

@TParcollet Noting two things down here to look at in the future for better performance:

  1. No weight decay on biases and layernorm parameters

  2. Finetuning on our model just modifies the transformer encoder, the huggingface model finetuning seems to also adjust the convolutional model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready to review Waiting on reviewer to provide feedback

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants