Skip to content

Conversation

@asumagic
Copy link
Collaborator

@asumagic asumagic commented Oct 4, 2022

When using the dict variant of read_audio, omitting the "stop" key would fail due to a breaking change in torchaudio 0.8.0 (SB currently requires any version newer than 0.9.0).

While this was undocumented, the code seemed to intend it as possible.

This PR does several things:

  • The function was cleaned up, with the call to torchaudio.load fixed, keeping backwards-compatibility.

  • Added sanity checks, which raise an exception when the start-stop range seems incorrect. Let me know if these are overkill.

  • It attempts to improve the documentation for read_audio significantly, documenting edge cases and previously undocumented behavior, namely

    • Keys for the dict variant are documented properly, and it is explained which ones are optional.
    • Behavior for multi-channel files is now explained properly.
    • General cleanups and formatting improvements.

Following this PR, all of these now work:

# always has worked
a = read_audio("/path/to/file.wav")

# always has worked
b = read_audio({
    "file": "/path/to/file.wav",
    "start": 8000,
    "stop": 16000
})

# was failing, now works
c = read_audio({
    "file": "/path/to/file.wav"
})

# was failing, now works
d = read_audio({
    "file": "/path/to/file.wav",
    "start": 8000
})

Tests were added accordingly.

Root cause

Previously, the num_frames parameter defaulted to 0, which meant "load from frame_offset through the end of file".
This is not the case anymore: num_frames now defaults to -1, with the same meaning. However, passing 0 now fails.

Implementation details

In order to best match the past intended behavior, if start == stop, this PR leaves the num_frames parameter unspecified, which I feel is more intuitive than the previous code here.

@mravanelli
Copy link
Collaborator

@Adel-Moumen, could you please review it when you have time?

@Adel-Moumen
Copy link
Collaborator

@Adel-Moumen, could you please review it when you have time?

Yes, sure. 🙂

@Adel-Moumen Adel-Moumen self-assigned this Nov 2, 2022
Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

Thanks for the PR. Everything looks nice. I tried your PR and everything works as expected.
Could you please add your name at the top of the file?

Thanks again!

@asumagic asumagic requested a review from Adel-Moumen November 21, 2022 13:56
Copy link
Collaborator

@Adel-Moumen Adel-Moumen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Adel-Moumen Adel-Moumen merged commit 0d50564 into speechbrain:develop Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants