add speech emotion diarization recipe #2048

BenoitWang · 2023-06-22T22:21:24Z

Hi @mravanelli , this is a first commit of the ZaionEmotionDataset recipe. Some small changes are to be done very soon. Paper link available here: https://arxiv.org/pdf/2306.12991.pdf

mravanelli · 2023-06-26T14:22:30Z

Hi @BenoitWang, thank you for this PR. Feel free to fix the little indentation issue that prevents the test to pass. Then, I can proceed with the review.

mravanelli · 2023-06-26T22:12:59Z

Hi @BenoitWang,
thank you for this PR! Here are my comments from the first code inspection:

Extra-dependencies:

The file should be called extra_requirements.txt (just to be uniform with the other ones).
The current version of the file takes this form:

pathlib==1.0.1
pydub==0.25.1
transformers==4.26.0
webrtcvad==2.0.10

Normally, it is better to have something like "dependency>=version". Otherwise, if you have a newer version of the library it gets uninstalled and this is quite annoying. I would take this opportunity to make sure the code is working with the latest version of each dependency. For instance, note that some changes have been made in transformers >= 4.30 (and our goal is to switch to that in other PRs, see Change needed in Whisper fine-tuning recipe to accommodate transformers4.30.0 #2016).
It looks like you are using webrtcvad as a VAD. This is pretty good, but we also have the SpeechBrain VAD (see https://huggingface.co/speechbrain/vad-crdnn-libriparty) that might be considered (according to our experience it works well in some contexts and not well in others, but if the performance is good for this dataset, it is better to use our own VAD).

README

Add paper link and citation when available
The Dropbox links for RAVDESS, ESD,JL-CORPUS, EmoV-DB are not working (permission issues?)
Add a table with the results.
Add the logs on Dropbox and add a link to the readme file.

DOCUMENTATION

At the beginning of zed_prepare.py, vad.py, prepare_*.py add a short header that describes the library content (e.g, https://github.com/speechbrain/speechbrain/blob/develop/recipes/LibriSpeech/librispeech_prepare.py#L1).
As for EDER.py, it is good to have docstrings with examples. I would take this opportunity to also add some unit tests as well in https://github.com/speechbrain/speechbrain/tree/develop/tests/unittests.

TRAINING SCRIPT

In the forward function of train.py, I can see .to(cuda:0) shouldn't it .to(self.device)?. Similarly, we have hparams["wav2vec2"] = hparams["wav2vec2"].to("cuda")
in train.py, some lines are commented like # replacements={"data_root": hparams["data_folder"]} or # sb.dataio.dataset.add_dynamic_item(datasets, label_pipeline). Do we need to uncomment them?

mravanelli · 2023-06-26T22:49:51Z

Also, as this is a new recipe it should be added in speechbrain/tests/recipes. Feel free to follow up with me if you cannot figure out how to do it from the existing examples.

BenoitWang · 2023-06-28T16:02:46Z

Hi @mravanelli thanks for the comments, I've made the above modifications, to do :

test speechbrain vad
put the logs into Dropbox (I still have some experiments to run)
add the recipe to speechbrain/tests/recipes

mravanelli · 2023-06-29T13:55:04Z

Thank you @BenoitWang for addressing my previous comments.. I can now successfully run the recipe. However, I have a few more comments to share:

I suggest making some improvements to the README. For example, it would be helpful to include an example of what an input audio and the output of an emotion diarization system could look like. Additionally, the reference to the paper mentioned in relation to emotion digitization is incorrect (it refers to a paper on LLMs). I recommend adding our paper as the appropriate reference.
Please correct the following error in the README: you currently mention that to train the model, we should run python train_with_wav2vec.py hparams/train_with_wav2vec.yaml, but it should actually be python train.py hparams/train.yaml.
During the data preparation, I noticed several warnings like "sox WARN dither: dither clipped 1 sample; decrease volume?" appearing. Please double-check if this is normal. If it is, we should consider suppressing these warnings.
In the current version, all dataset paths must end with a slash ("/") for example: $SLURM_TMPDIR/emotion/ESD/. Otherwise, an error occurs, indicating a missing file for the ESD dataset. To make the data preparation more robust, I suggest handling paths that do not end with a slash ("/") properly.
In the table, we should only report the results with WavLM and mention that other results can be found in the paper. Additionally, besides providing the link to the Hugging Face (HF) model, we need to include the link to the new SpeechBrain Dropbox repo. I can grant you access to upload the output folder there.
I believe it's necessary to create a suitable inference interface to support this new task. The input should be audio, such as a long recording, and the output should resemble what is provided in the VAD interface (e.g., https://huggingface.co/speechbrain/vad-crdnn-libriparty). Once the interface is ready, we can upload the model to the SpeechBrain HF.

Please take these comments into consideration. Let me know if you have any questions or need further clarification.

Best regards,

mravanelli · 2023-07-05T23:13:32Z

Thank you @BenoitWang!
Let me share a couple of comments before merging this PR.

In the README, you should add a note on the training time and the GPU needed to train the model. I tried using a 32GB GPU and it ran out of memory. Maybe we can add a note telling that you can reduce the batch size if you run OOM.
I cannot run the example of the HF repo: https://huggingface.co/speechbrain/emotion-diarization-wavlm-large/tree/main. I got this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/pretrained/interfaces.py", line 125, in foreign_class
    run_on_main(
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/utils/distributed.py", line 61, in run_on_main
    func(*args, **kwargs)
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/utils/parameter_transfer.py", line 246, in collect_files
    path = fetch(
  File "/scratch/ravanelm/speechbrain_emo/speechbrain/pretrained/fetching.py", line 161, in fetch
    fetched_file = huggingface_hub.hf_hub_download(
  File "/scratch/ravanelm/myenv_scratch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/scratch/ravanelm/myenv_scratch/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/home/ywang/zed_pr/sed_hf'. Use `repo_type` argument if needed.

Any idea? The API in the HF website is also not working.

mravanelli · 2023-07-06T00:18:59Z

Also, instead of using the foreign_class as an interface, I would suggest adding a standard interface in speechbrain/pretrained/interfaces.py. After merging this PR, it will available in the dev branch and it will be available in the main branch once we will release the next minor version (which I think we should do it soon).

BenoitWang · 2023-07-06T10:21:24Z

Hi @mravanelli, thanks a lot for the comments, I've made some necessary modifs according to your comments and normally everything is fine.

mravanelli · 2023-07-06T16:30:25Z

Thank you @BenoitWang,
I did a final check and everything seems to work properly. I think we can merge it!

BenoitWang added 5 commits June 23, 2023 00:16

add zed recipe

68515b2

small fixes

bd1c373

fixes

d47b1ad

fixes

b328e7b

add exception types

8cb118d

mravanelli self-requested a review June 26, 2023 14:21

mravanelli self-assigned this Jun 26, 2023

BenoitWang added 3 commits June 26, 2023 16:49

fix indentation

9a19e51

split complex funcs

4b55221

fixes

fdec6b0

BenoitWang added 4 commits June 28, 2023 17:36

add paper link/citation, results, update download links

7e4aa13

add header docs/ docstring with examples

1ebb8d4

update extra dependencies, fix device

79de051

fixes

d17eb3d

BenoitWang and others added 5 commits July 5, 2023 15:30

manage paths and reduce SOX warnings

d6bc4a8

fixes

672af7a

info of input/output, add links, add to recipes

045d2ed

Update README.md

9f841f1

Update ZaionEmotionDataset.csv

379772a

BenoitWang added 3 commits July 6, 2023 12:00

add run time and an interface

c14048b

merge fix

6d0d6ab

fix

5a9df26

mravanelli approved these changes Jul 6, 2023

View reviewed changes

mravanelli merged commit f02eafc into speechbrain:develop Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add speech emotion diarization recipe #2048

add speech emotion diarization recipe #2048

Uh oh!

BenoitWang commented Jun 22, 2023 •

edited

Loading

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

BenoitWang commented Jun 28, 2023

Uh oh!

mravanelli commented Jun 29, 2023

Uh oh!

mravanelli commented Jul 5, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

BenoitWang commented Jul 6, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

add speech emotion diarization recipe #2048

add speech emotion diarization recipe #2048

Uh oh!

Conversation

BenoitWang commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

BenoitWang commented Jun 28, 2023

Uh oh!

mravanelli commented Jun 29, 2023

Uh oh!

mravanelli commented Jul 5, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

BenoitWang commented Jul 6, 2023

Uh oh!

mravanelli commented Jul 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BenoitWang commented Jun 22, 2023 •

edited

Loading