Skip to content

Conversation

@shucongzhang
Copy link
Contributor

What does this PR do?

This PR adds the recipes of data preparation and training AED models with Libriheavy.

Shucong Zhang/Embedded AI /SRUK/Engineer/Samsung Electronics and others added 3 commits December 9, 2024 21:20
@TParcollet
Copy link
Collaborator

TParcollet commented Dec 9, 2024

Thanks @shucongzhang! I think @mravanelli, @Adel-Moumen and @pplantinga could have a look (or even review) :-)
One last thing to do @shucongzhang, as it's a new recipe, you'd need to add a recipe testing case (see the tests/ folder). You'd need to add a csv file into a folder with a line equivalent to the ones we have for the LibriSpeech test csv. You may also want to write somewhere that the full dataset is 50k hours+ so prettttttty heavy to store :p

Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @shucongzhang, thanks for this great PR! I left a couple of comments :)

@Adel-Moumen
Copy link
Collaborator

Hi all,

I ran the PR, and the scripts are all working! (nice)

I only have concern regarding the splits. Indeed, the current state of the PR forces the user to use the dev/test sets of Libriheavy. Unfortunately, those splits contains audio all located in the large split. If you are only using the small/medium (usually because of storage bottleneck) sets, you can't use this PR. What I would suggest in this case, is:

  1. do not force the dev nor test set from libriheavy in the prepare.py
  2. Document this behaviour in the README and make the user is aware that he can alternatively use the dev/test of LibriSpeech directly by following our recipes to prepare the data and change the location of valid_csv and test_csv.

Other than that, it's all good to me. I made one change by adding the possibility to modify SpeechBrain audio backend instead of using soundfile directly. We should also add this note in the README saying that we are using Soundfile for performances reasons.

Copy link
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @shucongzhang and @Adel-Moumen for the work! Let's advertise the recipe this week!

["id", "sig", "wrd", "tokens_bos", "tokens_eos", "tokens"],
)

# 5. If Dynamic Batching is used, we instantiate the needed samplers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am all for that we remove the possibility of NOT usng dynamic batching in future recipes. It's a waste of compute.

@TParcollet TParcollet merged commit 948fa2b into speechbrain:develop Feb 8, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants