Skip to content

Conversation

@anautsch
Copy link
Collaborator

@anautsch anautsch commented Oct 12, 2022

The goal of this PR is to refactor recipe testing. Not much, as most of testing needs are covered already.

In particular, the tests/recipes.csv file causes PRs to run into merge conflicts when another recipe enters the list. Conflicts for recipes make sense to highlight, when multiple contributors collaborate on the same dataset—but not across datasets.

Solution, part I:

  • split up the CSV into multiple, smaller ones (rm double entry lines, compare LibriSpeech)
  • drop RecipeID field (ID has no persistency service; spontaneous use let's the ID result from row number)
  • slightly refactor parsing mechanics to meet that (pytest tests/consistency works on CPU)
  • Tested on GPU node with:
    pytest tests/consistency
    ./tests/.run-HF-checks.sh
    ./tests/.run-load-yaml-tests.sh
    ./tests/.run-recipe-tests.sh
    

Not all datasets are available for all recipes all the time; recipe testing needs to operate regardless. The need is to extend upon tests/consistency/README.md :

  • test_debug_flags (optional):
    This optional field reports the flags to run recipe tests (see tests/.run-recipe-tests.sh). The goal of the recipe tests is to run an experiment with a tiny dataset and 1-make sure it runs 2-make sure it overfits properly.
    For instance, --data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=10 --skip_prep=True will run an experiments with the given train and hparams files using a tiny dataset (ASR_train.csv)

The existing testing uses the test_debug_flags flag here via test_flag[recipe_id]:

def run_recipe_tests(

# Composing command to run
cmd = (
    "python "
    + test_script[recipe_id]
    + " "
    + test_hparam[recipe_id]
    + " --output_folder="
    + output_fold
    + " "
    + test_flag[recipe_id]
    + " "
    + run_opts
)

In tests/recipes.csv, the majority of test definitions have test_debug_flags unspecified. If this field is specified, test_debug_checks is specified also: for a minimal testing set-up, after so and so many epochs, is an expected performance criterion reached.

This PR extends this concept to: test_debug_flags hints towards a sanity check as—regardless of data, can the recipe finish w/o breaking. For the concern of this PR, test_debug_checks can remain unspecified. This is based in separating dataset preparation from the rest of the scripts (i.e. constructing dataset representations by JSON/CSV meta data), these very JSON &/or CSV files are necessary in dataio pipelines that are then handed to batches. These dataio pipelines can be as diverse as it gets. Just run:

grep -r utils.data_pipeline.takes recipes
grep -r utils.data_pipeline.takes tests/integration

To make it short, grep -r utils.data_pipeline.takes recipes | sed 's/.*takes..//' | sed 's/")//' | sed 's|",."|\n|g' | sort -u gives:

class_string; clean_wav; clean_wav; command; duration; emo; enh_wav; enh_wav; fold; ground_truth_phn_ends; label; language; mix_wav; noise_wav; noisy_wav; noisy_wav; params["input_type"]); params["target_type"]); path; phn; phones; s1_wav; s2_wav; s3_wav; segment; semantics; speech; spk_id; start; stop; text; trans; transcript; transcription; translation_0; wav; wrd

note: params["input_type"] is 'clean_wav' and params["target_type"] can be 'phones' or 'words' (presently).

and grep -r utils.data_pipeline.takes tests/integration | sed 's/.*takes..//' | sed 's/")//' | sed 's|",."|\n|g' | sort -u gives:

char; mix_wav; phn; s1_wav; s2_wav; speech; spk_id; wav

The first and foremost goal is to provide SpeechBrain at the flexibility that research needs. Since this part of testing hints at some form of round-trip-testing, some more rigid notions come into play. Mappings help for alignment. The recipe data pipelines should neither be touched nor overwritten (they may re-appear within the recipe-declared Brain class). Here, data defined by existing integration tests comes in handy, since they need to provide testing data corresponding to the Task field defined in tests/recipes.csv !

Note: the above is an assumption right now; let's see what is in the achievable.

Solution, part II:

  • split of dataio & functionality testing (is a given; means: ensure test_debug_flags is adequately defined for all recipe tests)
  • check it runs on CPU
  • check ALL recipes are tested with GPU node
  • rework the 'avoid_list' mechanics in recipe testing (CPU vs CUDA testing discrepancy); i.e. check on the load_yaml_test function
  • test_debug_flags from 'optional' -> 'mandatory' (default: None if there is no !PLACEHOLDER in yaml files)
    Note: HF_repo & test_debug_checks should remain 'optional' (out-of-scope for this PR)

@anautsch
Copy link
Collaborator Author

Status: checks are running through & necessary fixes were deployed. Continuing with part 2.


There are remaining issues, in brief:

(14/43) Checking https://huggingface.co/speechbrain/lang-id-commonlanguage_ecapa...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 
(16/43) Checking https://huggingface.co/speechbrain/lang-id-voxlingua107-ecapa...
	Failed to load audio from udhr_th.mp3
(19/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-fr...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 
(33/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-commonvoice-rw...
	Failed to load audio from example.mp3
(41/43) Checking https://huggingface.co/speechbrain/vad-crdnn-libriparty...
	Failed to fetch metadata from pretrained_model_checkpoints/example_vad.wav
(42/43) Checking https://huggingface.co/speechbrain/asr-wav2vec2-ctc-aishell...
	y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 

41: https://huggingface.co/speechbrain/vad-crdnn-libriparty snippet needs to be revised.

16+33: Download issues for mp3s are known (and from what I remember caused by dependencies; a later task).

14+19+42: The PyTorch bugs share alike stacktraces e.g.

  File "speechbrain/pretrained/interfaces.py", line 979, in classify_file
    waveform = self.load_audio(path)
  File "speechbrain/pretrained/interfaces.py", line 245, in load_audio
    return self.audio_normalizer(signal, sr)
  File "speechbrain/dataio/preprocess.py", line 56, in __call__
    resampled = resampler(audio.unsqueeze(0)).squeeze(0)
  File "torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "speechbrain/processing/speech_augmentation.py", line 600, in forward
    resampled_waveform = self._perform_resample(waveforms)
  File "speechbrain/processing/speech_augmentation.py", line 673, in _perform_resample
    conv_wave = torch.nn.functional.conv1d(
RuntimeError: y.get_desc().is_nhwc() INTERNAL ASSERT FAILED at "../aten/src/ATen/native/mkldnn/Conv.cpp":143, please report a bug to PyTorch. 

SB pretrained models available via HF use speech_augmentation (might be necessary /or not).


// Note.

Tests on CPU w/ internet access

pytest tests/consistency
./tests/.run-load-yaml-tests.sh
./tests/.run-HF-checks.sh

Test that can download data on CPU first and then move to an offline GPU node

./tests/.run-recipe-tests.sh

@anautsch
Copy link
Collaborator Author

anautsch commented Oct 17, 2022

Note: to install all extra requirements declared in recipes:

find recipes | grep extra | xargs cat | sort -u | grep -v \# | xargs -I {} pip install {}

Tokenizer recipe testing (w/o data preparation) is functional; for all recipes w/ Tokenizers.

python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Task"], filters=["Tokenizer"])) else print("TEST PASSED")'

@anautsch
Copy link
Collaborator Author

AISHELL-1 recipe tests pass

python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=["AISHELL-1"], download_only=True)) else print("TEST PASSED")'

# then offline
TRANSFORMERS_OFFLINE=1 python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=["AISHELL-1"])) else print("TEST PASSED")'

note: these tests are w/o data preparation & w/o performance checks

@anautsch
Copy link
Collaborator Author

anautsch commented Oct 18, 2022

Aishell1Mix recipes "work" - there is one change I did not commit.

Since Aishell1Mix links to LibriMix, which in turn likes to invoke its logger

logger.info("Mean SISNR is {}".format(np.array(all_sisnrs).mean()))

that logger fails to be called.

... my solution was to add to the end of the above train file:

if __name__ == "__main__":
    ...
else:
    logger = logging.getLogger(__name__)

so in any case, a logger is created.

I do not dare to push this change to this PR - hence "works".
Alternative solutions may look into refactoring a self.logger or sth. else (not in the scope of this PR).

edit: putting the logger in LibriMix to the recipe top (as is in other recipes).


Note: I dropped this line and moved it to the yaml files

hparams["data_folder"] += f'/aishell1mix/Aishell1Mix{hparams["num_spks"]}'

which have now a different point of entry dataset_folder instead of the usual data_folder

num_spks: 2
dataset_folder: !PLACEHOLDER
data_folder: !ref <dataset_folder>/aishell1mix/Aishell1Mix<num_spks>

this is a hotfix solution on my side: now, both variables serve a unique purpose (and are not changed in the middle to be then repurposed).

@anautsch
Copy link
Collaborator Author

Also, I took this opportunity to run the pre-release tests

That's for when we are about to approach the next release.

Could you please take a look @anautsch?

Not part of this PR. Getting to some other things first; certainly I can take a look after this PR.

@mravanelli @TParcollet What's your take on URLs that are identified as being down?
(Some will not come up again.)

Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @anautsch ! You did an impressive works.

Please add an header in tests/__init__.py.

I left a comment on each thing that should be improved imo.

I will now try the PR in practice, I'll get back in touch if I am facing some troubles.

@@ -1 +1 @@
transformers==4.13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move all extras to latest dependencies.

It's a testing convenience issue – otherwise each recipe will need their own env... these tests should be done at every release version - for the next major release, I'd like to either lift that freezing or freeze it to their latest version if possible.

I get that just leaving it there is convenient for the end-user; it's not for maintenance through testing. On the other hand, rewriting the testing scripts for changing envs per recipe - why not (either) ...

@@ -1 +1 @@
transformers==4.13
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

source: !ref <whisper_hub>
freeze: !ref <freeze_whisper>
save_path: !ref <save_folder>/whisper_checkpoint
save_path: !ref <whisper_folder>/whisper_checkpoint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means that it will create a subfolder whisper_checkpoint in the folder whisper_checkpoint... better to put only !ref <whisper_folder> than !ref <whisper_folder>/whisper_checkpoint

print(f"\tsame: {results[repo]['same'] }")


def test_performance(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

hparams_file, run_opts, overrides = speechbrain.parse_arguments(
sys.argv[1:]
)
# speechbrain.utils.distributed.ddp_init_group(run_opts)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this line is commented ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. I think bc I use this for single-gpu only; working on another PR that does DDP testing now. Left it in as an indicator, I think - but surely it can drop. The aim here was to test with fewest parameters possible & in debug mode.

Link to the HuggingFace repository containing the pre-trained model (e.g., `https://huggingface.co/speechbrain/asr-wav2vec2-librispeech`). If specified, it must be mentioned in the README file.
- test_debug_flags (optional):
This optional field reports the flags to run recipe tests (see `tests/.run-recipe-tests.sh`). The goal of the recipe tests is to run an experiment with a tiny dataset and 1-make sure it runs 2-make sure it overfits properly.
For instance, `--data_folder=tests/samples/ASR/ --train_csv=tests/samples/annotation/ASR_train.csv --valid_csv=tests/samples/annotation/ASR_train.csv --test_csv=[tests/samples/annotation/ASR_train.csv] --number_of_epochs=10 --skip_prep=True` will run an experiments with the given train and hparams files using a tiny dataset (ASR_train.csv)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will run an experiments -> will run an experiment

```

Summary of changes in V.A with comments and tasks (for reviewers):
1. new break in `func_sig` declartion (in the signature of the function)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

declartion -> declaration

@@ -0,0 +1,218 @@
# Guiding contributors, reviewers & maintainers through the complexity of SpeechBrain testing.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to put it in our documentation (https://speechbrain.readthedocs.io/en/latest/contributing). I don't think it should be put in our README. It does not belong there. Too much details.

@anautsch
Copy link
Collaborator Author

The comments are addressed and markdown files are relocated. Conflicts with the latest develop branch are resolved.

@TParcollet @Adel-Moumen about the md relocations:

  • Approaches.md & PyCoverage.md to readthedocs as coverage
  • Future.md to Discussion 1814
  • Multiplatform integrated into general README
  • README.md to readthedocs as guidance
  • Maintainer.md integrated into tests/PRE-RELEASE-TESTS.md
  • PR_cycle.md to tests/README.md
  • Recipes.md to tests/recipes/README.md
  • Refactoring.md to tests/utils/README.md
  • rmdir tests/coverage

The readthedocs toc is updated (it's in the docs folder of this PR).

@ycemsubakan please lmk after your paper deadline, if the proposed changes to your recipes are ok.

Currently, the LibriSpeech recipes are re-running to see, where we are after all these changes (of this PR and on develop). This addresses also moving the un/freeze of transformers library to v4.25.1 (move the extra_requirements to their latest version).

@ycemsubakan
Copy link
Collaborator

Thank you @anautsch for this PR. This is really amazing! It will help us a lot in making our toolkit more reliable and stable.

Here are my comments

1. It looks like there are a few conflicts to solve.

2. As for the frozen vs unfrozen version of the transformer, I would simply try if everything is running on a full recipe with the latest version. For instance, you can try running r`ecipes/TIMIT/ASR/seq2seq/`, `recipes/TIMIT/ASR/transducer/`.
   If possible, I  would also check `self-supervised-learning/wav2vec2/`, as @TParcollet  said that there might be errors with the latest transformers. If everything is working with the latest version, I would keep it.

3. @ycemsubakan, there are a few things for you to check at your earliest convenience:


* a.  check the unsqueeze() in `recipes/BinauralWSJ0Mix/separation/train.py`

* b.  check changes in r`ecipes/REAL-M/sisnr-estimation/train.py`

* c. check changes in r`ecipes/WHAMandWHAMR/enhancement/train.py`
  Ideally, you should try to re-run a real recipe (1 epoch is enough) to make sure everything is fine.


4. Fix `recipes/LibriSpeech/ASR/transformer/hparams/train_hf_whisper.yaml` as mentioned by Titouan. This will lead to a folder like  <save_folder>/whisper_checkpoint/whisper_checkpoint

5.remove: tests/coverage/Future.md

Also, I noticed that some libraries have little test coverage. We need to improve that in the future, but in another PR.

I have gone through these recipes. Please see my comment above. REAL-M and WHAMandWHAMR seems to be fine, but please see my comments. For BinauralWSJ0Mix, it seems that the added .unsqueeze() breaks the code.

@anautsch
Copy link
Collaborator Author

I have gone through these recipes. Please see my comment above. REAL-M and WHAMandWHAMR seems to be fine, but please see my comments. For BinauralWSJ0Mix, it seems that the added .unsqueeze() breaks the code.

Thank you @ycemsubakan—I removed the added .unsqueeze(), and tests for the BinauralXSJ0Mix are passing now!

python -c 'from tests.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=[["BinauralWSJ0Mix"]], do_checks=False, run_opts="--device=cuda")) else print("TEST PASSED")'

@anautsch
Copy link
Collaborator Author

anautsch commented Feb 6, 2023

@ycemsubakan neat!

@TParcollet @mravanelli how'd you like to proceed with approving changes & merging this one?

Copy link
Collaborator

@TParcollet TParcollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@TParcollet TParcollet merged commit 4b485a3 into speechbrain:develop Feb 7, 2023
@asumagic asumagic mentioned this pull request Jun 14, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants