-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Whisper finetunng common voice #1809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Adel-Moumen
merged 51 commits into
speechbrain:develop
from
poonehmousavi:whisper-finetunng-common-voice
Feb 17, 2023
Merged
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
3ef35ae
add recepie for whisper finetining on common-voice data
poonehmousavi 7125dc6
add encoder-freeze optionto hparams +add extra dependecies
poonehmousavi 536de68
minor bug
poonehmousavi e6faeaf
set accented_letter to True for arabic and french
poonehmousavi 7f921bc
minor fix
poonehmousavi ec1ecec
fix reading audio bug
poonehmousavi 2ea8dc1
remove extra files
poonehmousavi 581011a
fix loss
poonehmousavi 6a73603
fix loss in ar and fr hparams
poonehmousavi 6fd540a
add enviroment
poonehmousavi 7ebf8da
change test to greedu search instead of beam-serach to solve memory …
poonehmousavi f9e5a9b
add hparms for mnongolian, spanish, hindi, serbian, german
poonehmousavi f3158c8
fix
poonehmousavi b67c8e3
fix memory issue+ add ja and fa
poonehmousavi 5a3b864
add whisper-encoder_only for common_voice, fix minor bugs
poonehmousavi ab32fdd
fix bug for es
poonehmousavi 2b5e79f
add weighted sum version
poonehmousavi cb90d19
update readme file
poonehmousavi 27edc2c
modify en hparams -set accented letter to False
poonehmousavi 7253544
add test_only option
poonehmousavi 6af3bdd
add final result table- final cleanig
poonehmousavi e917485
minor chage
poonehmousavi 523bd2e
minor change
poonehmousavi add221d
fix type
poonehmousavi 8e64623
fix requested change in review
poonehmousavi d92896a
remove enviroment file
poonehmousavi 0ca9ea6
fix flag checking for test_olnly
poonehmousavi 3f16d23
bug fix for ignoring padded tokens for loss calculation
poonehmousavi a4b2390
loss func
poonehmousavi fcb7617
add comments
poonehmousavi 51b19c5
final refactoring
poonehmousavi d1d3042
final refactoring
poonehmousavi 9b0a4da
minor refactoring(removing blank line,..)
poonehmousavi a6ba1b8
remove blank lines
poonehmousavi f71c2b5
minor refactor
poonehmousavi 73e22c3
apply pre-commit changes
poonehmousavi 2f4d83d
fix precommit bugs
poonehmousavi b7815e4
Merge branch 'speechbrain:develop' into whisper-finetunng-common-voice
poonehmousavi 46a41b6
add test, fix pre-commits error
poonehmousavi 4503074
fix CL test erros and precommit error for complicated method
poonehmousavi 0622477
fix link issue For CL workflow
poonehmousavi bdca54f
test
poonehmousavi a136f63
fix cl symlink bug
poonehmousavi 67008c7
Merge branch 'whisper-finetunng-common-voice' of https://github.com/p…
poonehmousavi b997e6f
Merge branch 'speechbrain:develop' into whisper-finetunng-common-voice
poonehmousavi 2034e26
remove whitespace
poonehmousavi fe0af9b
fix for CL
poonehmousavi 43ed4d1
remove doc_str example for whisper interface
poonehmousavi a80c979
fi readme file problem
poonehmousavi afbd5af
remove HF link from readme file
poonehmousavi 4198fff
remove datasets from dpendencies
poonehmousavi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| transformers |
142 changes: 142 additions & 0 deletions
142
recipes/CommonVoice/ASR/transformer/hparams/train_ar_hf_whisper.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| # ################################ | ||
| # Model: Whisper (Encoder-Decoder) + NLL | ||
| # Augmentation: TimeDomainSpecAugment | ||
| # Authors: Pooneh Mousavi 2022 | ||
| # ################################ | ||
|
|
||
| # Seed needs to be set at top of yaml, before objects with parameters are made | ||
| seed: 1986 | ||
| __set_seed: !apply:torch.manual_seed [!ref <seed>] | ||
| output_folder: !ref results/train_whisper/<seed>/<locale> | ||
| wer_file: !ref <output_folder>/wer.txt | ||
| save_folder: !ref <output_folder>/save | ||
| train_log: !ref <output_folder>/train_log.txt | ||
|
|
||
| # URL for the biggest Fairseq english whisper model. | ||
| whisper_hub: openai/whisper-tiny | ||
| test_only: False # Set it to True if you only want to do the evaluation | ||
|
|
||
| # Normalize inputs with the same normalization done in the paper (https://cdn.openai.com/papers/whisper.pdf). Refer to Appendix C for further information. | ||
| normalized_transcripts: True | ||
|
|
||
| # Data files | ||
| locale: ar # use 'it' for italian, 'fr' for french, 'en' for english , It is a language for common-voice data. | ||
| data_folder: !PLACEHOLDER | ||
| train_tsv_file: !ref <data_folder>/train.tsv # Standard CommonVoice .tsv files | ||
| dev_tsv_file: !ref <data_folder>/dev.tsv # Standard CommonVoice .tsv files | ||
| test_tsv_file: !ref <data_folder>/test.tsv # Standard CommonVoice .tsv files | ||
| accented_letters: True | ||
| train_csv: !ref <save_folder>/train.csv | ||
| valid_csv: !ref <save_folder>/dev.csv | ||
| test_csv: !ref <save_folder>/test.csv | ||
| skip_prep: False # Skip data preparation | ||
|
|
||
| # We remove utterance slonger than 10s in the train/dev/test sets as | ||
| # longer sentences certainly correspond to "open microphones". | ||
| avoid_if_longer_than: 10.0 | ||
|
|
||
| ckpt_interval_minutes: 30 # save checkpoint every N min | ||
|
|
||
| # Training parameters | ||
| number_of_epochs: 1 | ||
| lr_whisper: 0.00003 | ||
| sorting: ascending | ||
| auto_mix_prec: False | ||
| sample_rate: 16000 | ||
|
|
||
| # With data_parallel batch_size is split into N jobs | ||
| # With DDP batch_size is multiplied by N jobs | ||
| batch_size: 12 | ||
| test_batch_size: 8 | ||
|
|
||
| # These values are only used for the searchers. | ||
| # They needs to be hardcoded and should not be changed with Whisper. | ||
| # They are used as part of the searching process. | ||
| # The bos token of the searcher will be timestamp_index | ||
| # and will be concatenated with the bos, language and task tokens. | ||
| timestamp_index: 50363 | ||
| eos_index: 50257 | ||
| bos_index: 50258 | ||
|
|
||
| # Decoding parameters | ||
| min_decode_ratio: 0.0 | ||
| max_decode_ratio: 0.1 | ||
| test_beam_size: 8 | ||
|
|
||
| # Model parameters | ||
| freeze_whisper: False | ||
| freeze_encoder: True | ||
|
|
||
| train_loader_kwargs: | ||
| batch_size: !ref <batch_size> | ||
|
|
||
| valid_loader_kwargs: | ||
| batch_size: !ref <batch_size> | ||
|
|
||
| test_loader_kwargs: | ||
| batch_size: !ref <test_batch_size> | ||
|
|
||
| # | ||
| # Functions and classes | ||
| # | ||
| epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter | ||
| limit: !ref <number_of_epochs> | ||
|
|
||
| augmentation: !new:speechbrain.lobes.augment.TimeDomainSpecAugment | ||
| sample_rate: !ref <sample_rate> | ||
| speeds: [95, 100, 105] | ||
|
|
||
| whisper: !new:speechbrain.lobes.models.huggingface_whisper.HuggingFaceWhisper | ||
| source: !ref <whisper_hub> | ||
| freeze: !ref <freeze_whisper> | ||
| freeze_encoder: !ref <freeze_encoder> | ||
| save_path: !ref <save_folder>/whisper_checkpoint | ||
| encoder_only: False | ||
|
|
||
| log_softmax: !new:speechbrain.nnet.activations.Softmax | ||
| apply_log: True | ||
|
|
||
| nll_loss: !name:speechbrain.nnet.losses.nll_loss | ||
|
|
||
| modules: | ||
| whisper: !ref <whisper> | ||
|
|
||
| whisper_opt_class: !name:torch.optim.AdamW | ||
| lr: !ref <lr_whisper> | ||
| weight_decay: 0.000000001 | ||
|
|
||
| valid_greedy_searcher: !new:speechbrain.decoders.seq2seq.S2SWhisperGreedySearch | ||
| model: !ref <whisper> | ||
| bos_index: !ref <timestamp_index> | ||
| eos_index: !ref <eos_index> | ||
| min_decode_ratio: !ref <min_decode_ratio> | ||
| max_decode_ratio: !ref <max_decode_ratio> | ||
|
|
||
| test_beam_searcher: !new:speechbrain.decoders.seq2seq.S2SWhisperBeamSearch | ||
| module: [!ref <whisper>] | ||
| bos_index: !ref <timestamp_index> | ||
| eos_index: !ref <eos_index> | ||
| min_decode_ratio: !ref <min_decode_ratio> | ||
| max_decode_ratio: !ref <max_decode_ratio> | ||
| beam_size: !ref <test_beam_size> | ||
|
|
||
| lr_annealing_whisper: !new:speechbrain.nnet.schedulers.NewBobScheduler | ||
| initial_value: !ref <lr_whisper> | ||
| improvement_threshold: 0.0025 | ||
| annealing_factor: 0.9 | ||
| patient: 0 | ||
|
|
||
| checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer | ||
| checkpoints_dir: !ref <save_folder> | ||
| recoverables: | ||
| whisper: !ref <whisper> | ||
| scheduler_whisper: !ref <lr_annealing_whisper> | ||
| counter: !ref <epoch_counter> | ||
|
|
||
| train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger | ||
| save_file: !ref <train_log> | ||
|
|
||
| error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats | ||
|
|
||
| cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats | ||
| split_tokens: True | ||
142 changes: 142 additions & 0 deletions
142
recipes/CommonVoice/ASR/transformer/hparams/train_fa_hf_whisper.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,142 @@ | ||
| # ################################ | ||
| # Model: Whisper (Encoder-Decoder) + NLL | ||
| # Augmentation: TimeDomainSpecAugment | ||
| # Authors: Pooneh Mousavi 2022 | ||
| # ################################ | ||
|
|
||
| # Seed needs to be set at top of yaml, before objects with parameters are made | ||
| seed: 1986 | ||
| __set_seed: !apply:torch.manual_seed [!ref <seed>] | ||
| output_folder: !ref results/train_whisper/<seed>/<locale> | ||
Adel-Moumen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| wer_file: !ref <output_folder>/wer.txt | ||
| save_folder: !ref <output_folder>/save | ||
| train_log: !ref <output_folder>/train_log.txt | ||
|
|
||
| # URL for the biggest Fairseq english whisper model. | ||
| whisper_hub: openai/whisper-tiny | ||
| test_only: False # Set it to True if you only want to do the evaluation | ||
|
|
||
| # Normalize inputs with the same normalization done in the paper (https://cdn.openai.com/papers/whisper.pdf). Refer to Appendix C for further information. | ||
| normalized_transcripts: True | ||
Adel-Moumen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Data files | ||
| locale: fa # use 'it' for italian, 'fr' for french, 'en' for english , It is a language for common-voice data. | ||
| data_folder: !PLACEHOLDER | ||
| train_tsv_file: !ref <data_folder>/train.tsv # Standard CommonVoice .tsv files | ||
| dev_tsv_file: !ref <data_folder>/dev.tsv # Standard CommonVoice .tsv files | ||
| test_tsv_file: !ref <data_folder>/test.tsv # Standard CommonVoice .tsv files | ||
| accented_letters: True | ||
| train_csv: !ref <save_folder>/train.csv | ||
| valid_csv: !ref <save_folder>/dev.csv | ||
| test_csv: !ref <save_folder>/test.csv | ||
| skip_prep: False # Skip data preparation | ||
|
|
||
| # We remove utterance slonger than 10s in the train/dev/test sets as | ||
| # longer sentences certainly correspond to "open microphones". | ||
| avoid_if_longer_than: 10.0 | ||
|
|
||
| ckpt_interval_minutes: 30 # save checkpoint every N min | ||
|
|
||
| # Training parameters | ||
| number_of_epochs: 1 | ||
| lr_whisper: 0.00003 | ||
| sorting: ascending | ||
| auto_mix_prec: False | ||
| sample_rate: 16000 | ||
|
|
||
| # With data_parallel batch_size is split into N jobs | ||
| # With DDP batch_size is multiplied by N jobs | ||
| batch_size: 12 | ||
| test_batch_size: 8 | ||
|
|
||
| # These values are only used for the searchers. | ||
| # They needs to be hardcoded and should not be changed with Whisper. | ||
| # They are used as part of the searching process. | ||
| # The bos token of the searcher will be timestamp_index | ||
| # and will be concatenated with the bos, language and task tokens. | ||
| timestamp_index: 50363 | ||
| eos_index: 50257 | ||
| bos_index: 50258 | ||
|
|
||
| # Decoding parameters | ||
| min_decode_ratio: 0.0 | ||
| max_decode_ratio: 0.1 | ||
| test_beam_size: 8 | ||
|
|
||
| # Model parameters | ||
| freeze_whisper: False | ||
| freeze_encoder: True | ||
|
|
||
| train_loader_kwargs: | ||
| batch_size: !ref <batch_size> | ||
|
|
||
| valid_loader_kwargs: | ||
| batch_size: !ref <batch_size> | ||
|
|
||
| test_loader_kwargs: | ||
| batch_size: !ref <test_batch_size> | ||
|
|
||
| # | ||
| # Functions and classes | ||
| # | ||
| epoch_counter: !new:speechbrain.utils.epoch_loop.EpochCounter | ||
| limit: !ref <number_of_epochs> | ||
|
|
||
| augmentation: !new:speechbrain.lobes.augment.TimeDomainSpecAugment | ||
| sample_rate: !ref <sample_rate> | ||
| speeds: [95, 100, 105] | ||
|
|
||
| whisper: !new:speechbrain.lobes.models.huggingface_whisper.HuggingFaceWhisper | ||
| source: !ref <whisper_hub> | ||
| freeze: !ref <freeze_whisper> | ||
| freeze_encoder: !ref <freeze_encoder> | ||
| save_path: !ref <save_folder>/whisper_checkpoint | ||
| encoder_only: False | ||
|
|
||
| log_softmax: !new:speechbrain.nnet.activations.Softmax | ||
| apply_log: True | ||
|
|
||
| nll_loss: !name:speechbrain.nnet.losses.nll_loss | ||
|
|
||
| modules: | ||
| whisper: !ref <whisper> | ||
|
|
||
| whisper_opt_class: !name:torch.optim.AdamW | ||
| lr: !ref <lr_whisper> | ||
| weight_decay: 0.000000001 | ||
|
|
||
| valid_greedy_searcher: !new:speechbrain.decoders.seq2seq.S2SWhisperGreedySearch | ||
| model: !ref <whisper> | ||
| bos_index: !ref <timestamp_index> | ||
| eos_index: !ref <eos_index> | ||
| min_decode_ratio: !ref <min_decode_ratio> | ||
| max_decode_ratio: !ref <max_decode_ratio> | ||
|
|
||
| test_beam_searcher: !new:speechbrain.decoders.seq2seq.S2SWhisperBeamSearch | ||
| module: [!ref <whisper>] | ||
| bos_index: !ref <timestamp_index> | ||
| eos_index: !ref <eos_index> | ||
| min_decode_ratio: !ref <min_decode_ratio> | ||
| max_decode_ratio: !ref <max_decode_ratio> | ||
| beam_size: !ref <test_beam_size> | ||
|
|
||
| lr_annealing_whisper: !new:speechbrain.nnet.schedulers.NewBobScheduler | ||
| initial_value: !ref <lr_whisper> | ||
| improvement_threshold: 0.0025 | ||
| annealing_factor: 0.9 | ||
| patient: 0 | ||
|
|
||
| checkpointer: !new:speechbrain.utils.checkpoints.Checkpointer | ||
| checkpoints_dir: !ref <save_folder> | ||
| recoverables: | ||
| whisper: !ref <whisper> | ||
| scheduler_whisper: !ref <lr_annealing_whisper> | ||
| counter: !ref <epoch_counter> | ||
|
|
||
| train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger | ||
| save_file: !ref <train_log> | ||
|
|
||
| error_rate_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats | ||
|
|
||
| cer_computer: !name:speechbrain.utils.metric_stats.ErrorRateStats | ||
| split_tokens: True | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.