Skip to content

Conversation

@BenoitWang
Copy link
Collaborator

@BenoitWang BenoitWang commented Jun 22, 2023

Hi @mravanelli , this is a first commit of the ZaionEmotionDataset recipe. Some small changes are to be done very soon. Paper link available here: https://arxiv.org/pdf/2306.12991.pdf

@mravanelli mravanelli self-requested a review June 26, 2023 14:21
@mravanelli mravanelli self-assigned this Jun 26, 2023
@mravanelli
Copy link
Collaborator

Hi @BenoitWang, thank you for this PR. Feel free to fix the little indentation issue that prevents the test to pass. Then, I can proceed with the review.

@mravanelli
Copy link
Collaborator

Hi @BenoitWang,
thank you for this PR! Here are my comments from the first code inspection:

Extra-dependencies:

  • The file should be called extra_requirements.txt (just to be uniform with the other ones).
  • The current version of the file takes this form:
pathlib==1.0.1
pydub==0.25.1
transformers==4.26.0
webrtcvad==2.0.10
  • Normally, it is better to have something like "dependency>=version". Otherwise, if you have a newer version of the library it gets uninstalled and this is quite annoying. I would take this opportunity to make sure the code is working with the latest version of each dependency. For instance, note that some changes have been made in transformers >= 4.30 (and our goal is to switch to that in other PRs, see Change needed in Whisper fine-tuning recipe to accommodate transformers4.30.0 #2016).

  • It looks like you are using webrtcvad as a VAD. This is pretty good, but we also have the SpeechBrain VAD (see https://huggingface.co/speechbrain/vad-crdnn-libriparty) that might be considered (according to our experience it works well in some contexts and not well in others, but if the performance is good for this dataset, it is better to use our own VAD).

README

  • Add paper link and citation when available
  • The Dropbox links for RAVDESS, ESD,JL-CORPUS, EmoV-DB are not working (permission issues?)
  • Add a table with the results.
  • Add the logs on Dropbox and add a link to the readme file.

DOCUMENTATION

TRAINING SCRIPT

  • In the forward function of train.py, I can see .to(cuda:0) shouldn't it .to(self.device)?. Similarly, we have hparams["wav2vec2"] = hparams["wav2vec2"].to("cuda")
  • in train.py, some lines are commented like # replacements={"data_root": hparams["data_folder"]} or # sb.dataio.dataset.add_dynamic_item(datasets, label_pipeline). Do we need to uncomment them?

@mravanelli
Copy link
Collaborator

Also, as this is a new recipe it should be added in speechbrain/tests/recipes. Feel free to follow up with me if you cannot figure out how to do it from the existing examples.

@BenoitWang
Copy link
Collaborator Author

Hi @mravanelli thanks for the comments, I've made the above modifications, to do :

  1. test speechbrain vad
  2. put the logs into Dropbox (I still have some experiments to run)
  3. add the recipe to speechbrain/tests/recipes

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang for addressing my previous comments.. I can now successfully run the recipe. However, I have a few more comments to share:

  1. I suggest making some improvements to the README. For example, it would be helpful to include an example of what an input audio and the output of an emotion diarization system could look like. Additionally, the reference to the paper mentioned in relation to emotion digitization is incorrect (it refers to a paper on LLMs). I recommend adding our paper as the appropriate reference.

  2. Please correct the following error in the README: you currently mention that to train the model, we should run python train_with_wav2vec.py hparams/train_with_wav2vec.yaml, but it should actually be python train.py hparams/train.yaml.

  3. During the data preparation, I noticed several warnings like "sox WARN dither: dither clipped 1 sample; decrease volume?" appearing. Please double-check if this is normal. If it is, we should consider suppressing these warnings.

  4. In the current version, all dataset paths must end with a slash ("/") for example: $SLURM_TMPDIR/emotion/ESD/. Otherwise, an error occurs, indicating a missing file for the ESD dataset. To make the data preparation more robust, I suggest handling paths that do not end with a slash ("/") properly.

  5. In the table, we should only report the results with WavLM and mention that other results can be found in the paper. Additionally, besides providing the link to the Hugging Face (HF) model, we need to include the link to the new SpeechBrain Dropbox repo. I can grant you access to upload the output folder there.

  6. I believe it's necessary to create a suitable inference interface to support this new task. The input should be audio, such as a long recording, and the output should resemble what is provided in the VAD interface (e.g., https://huggingface.co/speechbrain/vad-crdnn-libriparty). Once the interface is ready, we can upload the model to the SpeechBrain HF.

Please take these comments into consideration. Let me know if you have any questions or need further clarification.

Best regards,

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang!
Let me share a couple of comments before merging this PR.

  1. In the README, you should add a note on the training time and the GPU needed to train the model. I tried using a 32GB GPU and it ran out of memory. Maybe we can add a note telling that you can reduce the batch size if you run OOM.

  2. I cannot run the example of the HF repo: https://huggingface.co/speechbrain/emotion-diarization-wavlm-large/tree/main. I got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/pretrained/interfaces.py", line 125, in foreign_class
    run_on_main(
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/utils/distributed.py", line 61, in run_on_main
    func(*args, **kwargs)
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/utils/parameter_transfer.py", line 246, in collect_files
    path = fetch(
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/pretrained/fetching.py", line 161, in fetch
    fetched_file = huggingface_hub.hf_hub_download(
  File "/scratch/ravanelm/myenv_scratch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/scratch/ravanelm/myenv_scratch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ywang/zed_pr/sed_hf'. Use `repo_type` argument if needed.

Any idea? The API in the HF website is also not working.

@mravanelli
Copy link
Collaborator

Also, instead of using the foreign_class as an interface, I would suggest adding a standard interface in speechbrain/pretrained/interfaces.py. After merging this PR, it will available in the dev branch and it will be available in the main branch once we will release the next minor version (which I think we should do it soon).

@BenoitWang
Copy link
Collaborator Author

Hi @mravanelli, thanks a lot for the comments, I've made some necessary modifs according to your comments and normally everything is fine.

@mravanelli
Copy link
Collaborator

Thank you @BenoitWang,
I did a final check and everything seems to work properly. I think we can merge it!

@mravanelli mravanelli merged commit f02eafc into speechbrain:develop Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants