SSL Benchmark #2047

salah-zaiem · 2023-06-22T17:54:54Z

Added a small class enabling to output aw weighted sum of a SSL model layers, with the weights learned during fine-tuning.
Downstream benchmark on 6 tasks .
More details about this benchmark are available in this published paper :
https://arxiv.org/abs/2306.00452

mravanelli · 2023-06-26T23:24:31Z

Thank you @salah-zaiem for this great work!

I think the main question to discuss is how we would like to support and structure benchmarks in SpeechBrain.
One idea is to put everything under speechbrain/recipes/benchmarks/SSL_Benchmark. Then we need to add a full section in our main speechbrain README.md and website that properly advertises the benchmarks we have.
Currently, we are working on CL-MARS (the continual learning one) and MOABB (the EEG one) and we need some consistency. For CL-MASR (see CL-MASR #2033) things are a bit easier as here we have a single dataset and many baselines. In the SSL benchmark, instead, we have different models and different datasets/tasks. This creates a significant overlap with standard recipes, which I don't think we can avoid easily.

@salah-zaiem and @TParcollet do you have suggestions?

salah-zaiem · 2023-06-27T07:30:19Z

Thank you @salah-zaiem for this great work!

I think the main question to discuss is how we would like to support and structure benchmarks in SpeechBrain.
One idea is to put everything under speechbrain/recipes/benchmarks/SSL_Benchmark. Then we need to add a full section in our main speechbrain README.md and website that properly advertises the benchmarks we have.

Currently, we are working on CL-MARS (the continual learning one) and MOABB (the EEG one) and we need some consistency. For CL-MASR (see CL-MASR #2033) things are a bit easier as here we have a single dataset and many baselines. In the SSL benchmark, instead, we have different models and different datasets/tasks. This creates a significant overlap with standard recipes, which I don't think we can avoid easily.

@salah-zaiem and @TParcollet do you have suggestions?

Putting everything under a Benchmark folder seems like a good idea.
There's a lot of overlap with a certain number of recipes yes. One first idea, is that I can generally replace the preparation scripts ("{dataset}_prepare.py") with simlinks I guess. The rest is not straightforward to remove. It is more or less equivalent to the addition of a train_with_wav2vec in the folders of the different recipes ( but it"s not because of the weighting ).
Another option, would be the SSL weighted feature extraction, something you can call in a compute_features, by adding it to "features.py". But it would make the use of the benchmark even more complicated. (even though it may be useful for other recipes)

mravanelli · 2023-06-29T21:39:26Z

Thank you @salah-zaiem. We discussed the role of benchmarks in the past speechbrain general meeting and we agree that benchmarks should go into an independent repository (speechbrain/benchmarks). For now, it is private, but I gave you access with writing permissions. Would you mind closing this PR and uploading your code there?

salah-zaiem · 2023-07-02T12:47:20Z

Thank you @salah-zaiem. We discussed the role of benchmarks in the past speechbrain general meeting and we agree that benchmarks should go into an independent repository (speechbrain/benchmarks). For now, it is private, but I gave you access with writing permissions. Would you mind closing this PR and uploading your code there?

I can close it, but I still need to get the WeightedSSLModel in SB to run the benchmark code. I kept this PR here only for that for the moment. Maybe I should just open a new PR ?

salah-zaiem added 4 commits June 22, 2023 18:57

added benchmark and weighted ssl model

dacf740

added a few comments

1937af4

updated the readme in recipes/SSL_benchmark

48f2949

corrected yamllint errors

a237a54

salah-zaiem requested a review from TParcollet June 22, 2023 17:55

salah-zaiem added 7 commits June 22, 2023 20:11

flake8 a few training scripts

46941df

good pre-commit

8d9a182

removed useless files

af31026

added tests

1d3befe

fixed test paths

62fa6ca

changed file name in test

1efd5af

created SSL_benchmark.csv in tests/recipes/

dfea1b7

mravanelli assigned salah-zaiem Jun 26, 2023

mravanelli self-requested a review June 26, 2023 20:57

only kept the WeightedSSLModel class in this PR

c604b89

removed useless changes

61d4af1

mravanelli approved these changes Jul 2, 2023

View reviewed changes

mravanelli merged commit 930ebb8 into speechbrain:develop Jul 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SSL Benchmark #2047

SSL Benchmark #2047

Uh oh!

salah-zaiem commented Jun 22, 2023 •

edited

Loading

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

salah-zaiem commented Jun 27, 2023

Uh oh!

mravanelli commented Jun 29, 2023

Uh oh!

salah-zaiem commented Jul 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SSL Benchmark #2047

SSL Benchmark #2047

Uh oh!

Conversation

salah-zaiem commented Jun 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mravanelli commented Jun 26, 2023

Uh oh!

salah-zaiem commented Jun 27, 2023

Uh oh!

mravanelli commented Jun 29, 2023

Uh oh!

salah-zaiem commented Jul 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

salah-zaiem commented Jun 22, 2023 •

edited

Loading