refactor: recipe testing CSVs #1600

anautsch · 2022-10-12T16:12:48Z

The goal of this PR is to refactor recipe testing. Not much, as most of testing needs are covered already.

In particular, the tests/recipes.csv file causes PRs to run into merge conflicts when another recipe enters the list. Conflicts for recipes make sense to highlight, when multiple contributors collaborate on the same dataset—but not across datasets.

Solution, part I:

split up the CSV into multiple, smaller ones (rm double entry lines, compare LibriSpeech)
drop RecipeID field (ID has no persistency service; spontaneous use let's the ID result from row number)
slightly refactor parsing mechanics to meet that (pytest tests/consistency works on CPU)

Tested on GPU node with:

pytest tests/consistency
./tests/.run-HF-checks.sh
./tests/.run-load-yaml-tests.sh
./tests/.run-recipe-tests.sh

Not all datasets are available for all recipes all the time; recipe testing needs to operate regardless. The need is to extend upon tests/consistency/README.md :

test_debug_flags (optional):
This optional field reports the flags to run recipe tests (see tests/.run-recipe-tests.sh). The goal of the recipe tests is to run an experiment with a tiny dataset and 1-make sure it runs 2-make sure it overfits properly.
For instance, --data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=10 --skip_prep=True will run an experiments with the given train and hparams files using a tiny dataset (ASR_train.csv)

The existing testing uses the test_debug_flags flag here via test_flag[recipe_id]:

speechbrain/speechbrain/utils/recipe_tests.py

Line 314 in 64c8fa9

def run_recipe_tests(

# Composing command to run
cmd = (
    "python "
    + test_script[recipe_id]
    + " "
    + test_hparam[recipe_id]
    + " --output_folder="
    + output_fold
    + " "
    + test_flag[recipe_id]
    + " "
    + run_opts
)

In tests/recipes.csv, the majority of test definitions have test_debug_flags unspecified. If this field is specified, test_debug_checks is specified also: for a minimal testing set-up, after so and so many epochs, is an expected performance criterion reached.

This PR extends this concept to: test_debug_flags hints towards a sanity check as—regardless of data, can the recipe finish w/o breaking. For the concern of this PR, test_debug_checks can remain unspecified. This is based in separating dataset preparation from the rest of the scripts (i.e. constructing dataset representations by JSON/CSV meta data), these very JSON &/or CSV files are necessary in dataio pipelines that are then handed to batches. These dataio pipelines can be as diverse as it gets. Just run:

grep -r utils.data_pipeline.takes recipes
grep -r utils.data_pipeline.takes tests/integration

class_string; clean_wav; clean_wav; command; duration; emo; enh_wav; enh_wav; fold; ground_truth_phn_ends; label; language; mix_wav; noise_wav; noisy_wav; noisy_wav; params["input_type"]); params["target_type"]); path; phn; phones; s1_wav; s2_wav; s3_wav; segment; semantics; speech; spk_id; start; stop; text; trans; transcript; transcription; translation_0; wav; wrd

note: params["input_type"] is 'clean_wav' and params["target_type"] can be 'phones' or 'words' (presently).

char; mix_wav; phn; s1_wav; s2_wav; speech; spk_id; wav

The first and foremost goal is to provide SpeechBrain at the flexibility that research needs. Since this part of testing hints at some form of round-trip-testing, some more rigid notions come into play. Mappings help for alignment. The recipe data pipelines should neither be touched nor overwritten (they may re-appear within the recipe-declared Brain class). Here, data defined by existing integration tests comes in handy, since they need to provide testing data corresponding to the Task field defined in tests/recipes.csv !

Note: the above is an assumption right now; let's see what is in the achievable.

Solution, part II:

split of dataio & functionality testing (is a given; means: ensure test_debug_flags is adequately defined for all recipe tests)
check it runs on CPU
check ALL recipes are tested with GPU node
rework the 'avoid_list' mechanics in recipe testing (CPU vs CUDA testing discrepancy); i.e. check on the load_yaml_test function
test_debug_flags from 'optional' -> 'mandatory' (default: None if there is no !PLACEHOLDER in yaml files)
Note: HF_repo & test_debug_checks should remain 'optional' (out-of-scope for this PR)

anautsch · 2022-10-14T12:10:34Z

Status: checks are running through & necessary fixes were deployed. Continuing with part 2.

There are remaining issues, in brief:

(14/43) Checking https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 
(16/43) Checking https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa...
	Failed to load audio from udhr_th.mp3
(19/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-fr...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 
(33/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-rw...
	Failed to load audio from example.mp3
(41/43) Checking https://huggingface.co/speechbrain/vad-crdnn-libriparty...
	Failed to fetch metadata from pretrained_model_checkpoints/example_vad.wav
(42/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-ctc-aishell...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch.

41: https://huggingface.co/speechbrain/vad-crdnn-libriparty snippet needs to be revised.

16+33: Download issues for mp3s are known (and from what I remember caused by dependencies; a later task).

14+19+42: The PyTorch bugs share alike stacktraces e.g.

  File "speechbrain/pretrained/interfaces.py", line 979, in classify_file
    waveform = self.load_audio(path)
  File "speechbrain/pretrained/interfaces.py", line 245, in load_audio
    return self.audio_normalizer(signal, sr)
  File "speechbrain/dataio/preprocess.py", line 56, in __call__
    resampled = resampler(audio.unsqueeze(0)).squeeze(0)
  File "torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "speechbrain/processing/speech_augmentation.py", line 600, in forward
    resampled_waveform = self._perform_resample(waveforms)
  File "speechbrain/processing/speech_augmentation.py", line 673, in _perform_resample
    conv_wave = torch.nn.functional.conv1d(
RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch.

SB pretrained models available via HF use speech_augmentation (might be necessary /or not).

// Note.

Tests on CPU w/ internet access

pytest tests/consistency
./tests/.run-load-yaml-tests.sh
./tests/.run-HF-checks.sh

Test that can download data on CPU first and then move to an offline GPU node

./tests/.run-recipe-tests.sh

anautsch · 2022-10-17T11:57:46Z

Note: to install all extra requirements declared in recipes:

find recipes | grep extra | xargs cat | sort -u | grep -v \# | xargs -I {} pip install {}

Tokenizer recipe testing (w/o data preparation) is functional; for all recipes w/ Tokenizers.

python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Task"], filters=["Tokenizer"])) else print("TEST PASSED")'

anautsch · 2022-10-18T13:48:34Z

AISHELL-1 recipe tests pass

python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=["AISHELL-1"], download_only=True)) else print("TEST PASSED")'

# then offline
TRANSFORMERS_OFFLINE=1 python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=["AISHELL-1"])) else print("TEST PASSED")'

note: these tests are w/o data preparation & w/o performance checks

anautsch · 2022-10-18T18:09:22Z

Aishell1Mix recipes "work" - there is one change I did not commit.

Since Aishell1Mix links to LibriMix, which in turn likes to invoke its logger

speechbrain/recipes/LibriMix/separation/train.py

Line 418 in 64c8fa9

logger.info("Mean SISNR is {}".format(np.array(all_sisnrs).mean()))

that logger fails to be called.

... my solution was to add to the end of the above train file:

if __name__ == "__main__":
    ...
else:
    logger = logging.getLogger(__name__)

so in any case, a logger is created.

I do not dare to push this change to this PR - hence "works".
Alternative solutions may look into refactoring a self.logger or sth. else (not in the scope of this PR).

edit: putting the logger in LibriMix to the recipe top (as is in other recipes).

Note: I dropped this line and moved it to the yaml files

speechbrain/recipes/Aishell1Mix/separation/train.py

Line 74 in 64c8fa9

hparams["data_folder"] += f'/aishell1mix/Aishell1Mix{hparams["num_spks"]}'

which have now a different point of entry dataset_folder instead of the usual data_folder

num_spks: 2
dataset_folder: !PLACEHOLDER
data_folder: !ref <dataset_folder>/aishell1mix/Aishell1Mix<num_spks>

this is a hotfix solution on my side: now, both variables serve a unique purpose (and are not changed in the middle to be then repurposed).

anautsch · 2023-01-16T14:19:05Z

Also, I took this opportunity to run the pre-release tests

That's for when we are about to approach the next release.

Could you please take a look @anautsch?

Not part of this PR. Getting to some other things first; certainly I can take a look after this PR.

@mravanelli @TParcollet What's your take on URLs that are identified as being down?
(Some will not come up again.)

Adel-Moumen

Thank you for this PR @anautsch ! You did an impressive works.

Please add an header in tests/__init__.py.

I left a comment on each thing that should be improved imo.

I will now try the PR in practice, I'll get back in touch if I am facing some troubles.

Adel-Moumen · 2023-01-16T13:42:45Z

recipes/CommonVoice/ASR/CTC/extra_requirements.txt

@@ -1 +1 @@
-transformers==4.13


move all extras to latest dependencies.

It's a testing convenience issue – otherwise each recipe will need their own env... these tests should be done at every release version - for the next major release, I'd like to either lift that freezing or freeze it to their latest version if possible.

I get that just leaving it there is convenient for the end-user; it's not for maintenance through testing. On the other hand, rewriting the testing scripts for changing envs per recipe - why not (either) ...

Adel-Moumen · 2023-01-16T13:45:30Z

recipes/DVoice/ASR/CTC/extra_requirements.txt

@@ -1 +1 @@
-transformers==4.13


recipes/IEMOCAP/emotion_recognition/train.py

recipes/IEMOCAP/emotion_recognition/train_with_wav2vec2.py

Adel-Moumen · 2023-01-16T13:59:17Z

recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml

    source: !ref <whisper_hub>
    freeze: !ref <freeze_whisper>
-    save_path: !ref <save_folder>/whisper_checkpoint
+    save_path: !ref <whisper_folder>/whisper_checkpoint


it means that it will create a subfolder whisper_checkpoint in the folder whisper_checkpoint... better to put only !ref <whisper_folder> than !ref <whisper_folder>/whisper_checkpoint

Adel-Moumen · 2023-01-16T14:31:39Z

tests/utils/refactoring_checks.py

+            print(f"\tsame: {results[repo]['same'] }")
+
+
+def test_performance(


Adel-Moumen · 2023-01-16T14:32:07Z

tests/utils/refactoring_checks.py

+    hparams_file, run_opts, overrides = speechbrain.parse_arguments(
+        sys.argv[1:]
+    )
+    # speechbrain.utils.distributed.ddp_init_group(run_opts)


why this line is commented ?

good point. I think bc I use this for single-gpu only; working on another PR that does DDP testing now. Left it in as an indicator, I think - but surely it can drop. The aim here was to test with fewest parameters possible & in debug mode.

Adel-Moumen · 2023-01-16T14:37:24Z

tests/consistency/README.md

    Link to the HuggingFace repository containing the pre-trained model (e.g., `https://huggingface.co/speechbrain/asr-wav2vec2-librispeech`). If specified, it must be mentioned in the README file.
 - test_debug_flags (optional):
    This optional field reports the flags to run recipe tests (see `tests/.run-recipe-tests.sh`). The goal of the recipe tests is to run an experiment with a tiny dataset and 1-make sure it runs 2-make sure it overfits properly.
    For instance, `--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=10 --skip_prep=True` will run an experiments with the given train and hparams files using a tiny dataset (ASR_train.csv)


will run an experiments -> will run an experiment

Adel-Moumen · 2023-01-16T14:47:01Z

tests/coverage/PR_cycle.md

+```
+
+Summary of changes in V.A with comments and tasks (for reviewers):
+1. new break in `func_sig` declartion (in the signature of the function)


declartion -> declaration

Adel-Moumen · 2023-01-16T14:57:00Z

tests/coverage/README.md

@@ -0,0 +1,218 @@
+# Guiding contributors, reviewers & maintainers through the complexity of SpeechBrain testing.


I think it would be better to put it in our documentation (https://speechbrain.readthedocs.io/en/latest/contributing). I don't think it should be put in our README. It does not belong there. Too much details.

…ting

anautsch · 2023-01-23T17:01:54Z

The comments are addressed and markdown files are relocated. Conflicts with the latest develop branch are resolved.

@TParcollet @Adel-Moumen about the md relocations:

Approaches.md & PyCoverage.md to readthedocs as coverage
Future.md to Discussion 1814
Multiplatform integrated into general README
README.md to readthedocs as guidance
Maintainer.md integrated into tests/PRE-RELEASE-TESTS.md
PR_cycle.md to tests/README.md
Recipes.md to tests/recipes/README.md
Refactoring.md to tests/utils/README.md
rmdir tests/coverage

The readthedocs toc is updated (it's in the docs folder of this PR).

@ycemsubakan please lmk after your paper deadline, if the proposed changes to your recipes are ok.

Currently, the LibriSpeech recipes are re-running to see, where we are after all these changes (of this PR and on develop). This addresses also moving the un/freeze of transformers library to v4.25.1 (move the extra_requirements to their latest version).

…rain into refactor-recipe-testing

ycemsubakan · 2023-01-30T04:58:18Z

Thank you @anautsch for this PR. This is really amazing! It will help us a lot in making our toolkit more reliable and stable.

Here are my comments

1. It looks like there are a few conflicts to solve.

2. As for the frozen vs unfrozen version of the transformer, I would simply try if everything is running on a full recipe with the latest version. For instance, you can try running r`ecipes/TIMIT/ASR/seq2seq/`, `recipes/TIMIT/ASR/transducer/`.
   If possible, I  would also check `self-supervised-learning/wav2vec2/`, as @TParcollet  said that there might be errors with the latest transformers. If everything is working with the latest version, I would keep it.

3. @ycemsubakan, there are a few things for you to check at your earliest convenience:


* a.  check the unsqueeze() in `recipes/BinauralWSJ0Mix/separation/train.py`

* b.  check changes in r`ecipes/REAL-M/sisnr-estimation/train.py`

* c. check changes in r`ecipes/WHAMandWHAMR/enhancement/train.py`
  Ideally, you should try to re-run a real recipe (1 epoch is enough) to make sure everything is fine.


4. Fix `recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml` as mentioned by Titouan. This will lead to a folder like  <save_folder>/whisper_checkpoint/whisper_checkpoint

5.remove: tests/coverage/Future.md

Also, I noticed that some libraries have little test coverage. We need to improve that in the future, but in another PR.

I have gone through these recipes. Please see my comment above. REAL-M and WHAMandWHAMR seems to be fine, but please see my comments. For BinauralWSJ0Mix, it seems that the added .unsqueeze() breaks the code.

anautsch · 2023-01-30T12:11:09Z

I have gone through these recipes. Please see my comment above. REAL-M and WHAMandWHAMR seems to be fine, but please see my comments. For BinauralWSJ0Mix, it seems that the added .unsqueeze() breaks the code.

Thank you @ycemsubakan—I removed the added .unsqueeze(), and tests for the BinauralXSJ0Mix are passing now!

python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=[["BinauralWSJ0Mix"]], do_checks=False, run_opts="--device=cuda")) else print("TEST PASSED")'

The tests pass even without the argument added in this PR on line 161 of real-m train.py.

anautsch · 2023-02-06T08:22:35Z

@ycemsubakan neat!

@TParcollet @mravanelli how'd you like to proceed with approving changes & merging this one?

TParcollet

LGTM

anautsch added 11 commits October 12, 2022 18:08

refactor: recipe testing CSVs

a65e4c5

HF tests: download first - recipe CSVs fix

861007a

harmonised paths for fewer downloads

d68c8eb

load yaml fixes; updated avoid list

bdb411e

load_yaml: hotfix for custom_model

28ab391

fix that works on laptop and on server...

e6ff949

recipe tests pre-download models

d1ee03a

pre-download fix

9e88976

hf w2v2 interface reduce amount of online checks

e1ca4fc

fixes

fc5cf32

minor fixes on HF checks

fd7d668

recipe arch tests for Tokenizers

6ca8fce

anautsch added 4 commits October 18, 2022 12:19

availing AISHELL-1 initial edits

c9147dd

fix when downloads & when local files only

d16bfb4

linters

26a0dcc

minor fixes

8152083

anautsch added 8 commits October 18, 2022 18:16

availing Aishell1Mix (init)

9b18d74

fixes

58a5deb

fixes

1c3c80d

symlinks for recipe

9fe6c78

seperation w/ noise & 3 spk

2c1f03b

fixes

3533971

additional data root

0b80013

annotation fix

f6fc5d3

anautsch added 2 commits October 19, 2022 07:30

fixes on CommonVoice; availing BinauralWSJ0Mix & LibriMix

1492fa9

yaml fixes & testing to target performance cheks only

42a160f

Adel-Moumen reviewed Jan 16, 2023

View reviewed changes

anautsch added 13 commits January 16, 2023 19:22

fix 1787

e084375

tokenizer update

bb2a2ec

added testing data

1bebf96

fixes for recipe testing; their duration & speedup in scoring for tes…

5031f3c

…ting

minor fixes

b2deef1

resolve merging

9d16c5a

minor edits

e67f65a

merge develop

0110f4e

minor edits

f858c65

comments & path updates

faad5f8

relocates md files from tests/coverage

92b077c

skip check on md file in tests/recipes

07f6d76

minor fix

2baf1dc

anautsch and others added 3 commits January 24, 2023 14:07

path fixes

4063068

Merge branch 'refactor-recipe-testing' of github.com:anautsch/speechb…

d286ae3

…rain into refactor-recipe-testing

fixing the naming issue in hparams/convtasnet-parallel-noise.yaml

3f7d54d

minor edits

51d528e

ycemsubakan added 4 commits February 5, 2023 21:40

Fixing WHAMandWHAMR/enhancement/train.py

a53b9fb

Attempting to fix the precommit test failure

bdcc9ec

Remove if noise==None

79b65ea

Remove the 'if hparams.dynamic_mixing' on real-m train.py.

fd39411

The tests pass even without the argument added in this PR on line 161 of real-m train.py.

TParcollet approved these changes Feb 7, 2023

View reviewed changes

TParcollet merged commit 4b485a3 into speechbrain:develop Feb 7, 2023

asumagic mentioned this pull request Jun 14, 2023

CI check fixes #2029

Merged

3 tasks

		print(f"\tsame: {results[repo]['same'] }")


		def test_performance(

		@@ -0,0 +1,218 @@
		# Guiding contributors, reviewers & maintainers through the complexity of SpeechBrain testing.

		@@ -1 +1 @@
		transformers==4.13

		@@ -1 +1 @@
		transformers==4.13

refactor: recipe testing CSVs #1600

refactor: recipe testing CSVs #1600

Uh oh!

Conversation

anautsch commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anautsch commented Oct 14, 2022

Uh oh!

anautsch commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anautsch commented Oct 18, 2022

Uh oh!

anautsch commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anautsch commented Jan 16, 2023

Uh oh!

Adel-Moumen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anautsch commented Jan 23, 2023

Uh oh!

ycemsubakan commented Jan 30, 2023

Uh oh!

anautsch commented Jan 30, 2023

Uh oh!

anautsch commented Feb 6, 2023

Uh oh!

TParcollet left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

anautsch commented Oct 12, 2022 •

edited

Loading

anautsch commented Oct 17, 2022 •

edited

Loading

anautsch commented Oct 18, 2022 •

edited

Loading