Skip to content

Conversation

@poonehmousavi
Copy link
Collaborator

Add whisper finetuning recepies for Common-voice data for following languages

@Adel-Moumen Adel-Moumen self-assigned this Jan 23, 2023
Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Many thanks for this PR! You did a really great job. :-)

Could you please remove the file environment.yaml ? I don't see any reasons to keep it.

Please see all the issues mentioned in the review.

Concerning the normalized_transcripts=True have you trained your models with it ? If so, it might be worth checking if it did impact your results... also, are you going to release the models on the Gdrive/HF?

Please fix the pre-commit.

Thanks again for your impressive works! :-)
Adel

@poonehmousavi
Copy link
Collaborator Author

All changed are applied and pre-commit is tested

Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really neats what you are doing! Thanks again!

Please look at my comments and could you please take a look at the pre-commit fails? Read the following tutorial https://speechbrain.readthedocs.io/en/latest/contributing.html so that you know how to solve the issues.

Thanks!

@anautsch
Copy link
Collaborator

@Adel-Moumen pointed me to your current error log.

CSV:  recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py
path: recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py

they look identical, yet, the error is for

    if not (os.path.exists(file.strip())):
        print(
            "\tERROR: The file %s listed in %s does not exist!"
            % (file, recipe_csvfile)
        )

which suggests the file isn't existing—yet, paths etc seem to match.

You can try this offline when you made a change with

pytest tests/consistency

note: this will consider also files you have not versioned by git but which are in your repo folders.

I don't think my post is particularly helpful, other than you don't need to push that much here. I'm looking into it, if I can find sth more helpful...

ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_sr_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True,
ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_mn_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True,
ASR,CommonVoice,recipes/CommonVoice/ASR/transformer/train_with_whisper.py,recipes/CommonVoice/ASR/transformer/hparams/train_hi_hf_whisper.yaml,recipes/CommonVoice/ASR/transformer/common_voice_prepare.py,recipes/CommonVoice/ASR/transformer/README.md,https://drive.google.com/drive/folders/11NMzY0zV-NqJmPMyZfC3RtT64bYe-G_O?usp=sharing,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=1 --skip_prep=True,
SSL,CommonVoice,recipes/CommonVoice/self-supervised-learning/wav2vec2/train_hf_wav2vec2.py,recipes/CommonVoice/self-supervised-learning/wav2vec2/hparams/wav2vec2_base.yaml,recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py,recipes/CommonVoice/self-supervised-learning/wav2vec2/README.md,,,--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=tests/samples/annotation/ASR_train.csv --number_of_epochs=2 --skip_prep=True --d_model=128 --wav2vec2_folder=tests/tmp/wav2vec2_checkpoint,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@poonehmousavi this looks good btw, the line of concern did not change ... interesting

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what makes it more confusing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if the github workflow got stuck in an odd state

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, I did try pytest on my local repo and I didn't got that error and I didn't have any uncommitted change related to that files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the workflow then...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could reproduce the error on my end

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ll recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py
recipes/CommonVoice/self-supervised-learning/wav2vec2/common_voice_prepare.py -> '../../common_voice_prepare.py'$'\n'

this is an invalid symlink

@Adel-Moumen
Copy link
Collaborator

LGTM!

Many thanks for your great work. It has been a pleasure to review your PR.

@Adel-Moumen Adel-Moumen merged commit c723843 into speechbrain:develop Feb 17, 2023
@poonehmousavi poonehmousavi deleted the whisper-finetunng-common-voice branch July 29, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants