Skip to content

VAD - very slow "apply_threshold" when running on cuda device + possible solution #2638

@nie3e

Description

@nie3e

Describe the bug

When using VAD on a 6-hour audio with CUDA, the function apply_threshold is notably slow. I've observed that the loop within this function is the primary bottleneck.

        #Loop over batches and time steps
        for batch in range(vad_th.shape[0]):
            for time_step in range(vad_th.shape[1] - 1):
                if (
                    vad_th[batch, time_step] == 2
                    and vad_th[batch, time_step + 1] == 1
                ):
                    vad_th[batch, time_step + 1] = 2

Expected behaviour

It will be quicker if we first move vad_th tensor to cpu: vad_th = vad_th.cpu()
And even faster if we use numpy array and convert it back to a tensor before returning the result:

vad_th = vad_th.cpu().numpy()
[...]
vad_th = torch.from_numpy(vad_th)

Fixed function:

    def apply_threshold(
        self, vad_prob, activation_th=0.5, deactivation_th=0.25
    ):
        """Scans the frame-level speech probabilities and applies a threshold
        on them. Speech starts when a value larger than activation_th is
        detected, while it ends when observing a value lower than
        the deactivation_th.

        Arguments
        ---------
        vad_prob: torch.Tensor
            Frame-level speech probabilities.
        activation_th:  float
            Threshold for starting a speech segment.
        deactivation_th: float
            Threshold for ending a speech segment.

        Returns
        -------
        vad_th: torch.Tensor
            Tensor containing 1 for speech regions and 0 for non-speech regions.
        """
        vad_activation = (vad_prob >= activation_th).int()
        vad_deactivation = (vad_prob >= deactivation_th).int()
        vad_th = vad_activation + vad_deactivation

        # Move tensor to cpu, make numpy array
        vad_th = vad_th.cpu().numpy()

        #Loop over batches and time steps
        for batch in range(vad_th.shape[0]):
            for time_step in range(vad_th.shape[1] - 1):
                if (
                    vad_th[batch, time_step] == 2
                    and vad_th[batch, time_step + 1] == 1
                ):
                    vad_th[batch, time_step + 1] = 2

        # Get tensor from numpy
        vad_th = torch.from_numpy(vad_th)
        vad_th[vad_th == 1] = 0
        vad_th[vad_th == 2] = 1
        return vad_th

To Reproduce

This is my code:

from speechbrain.inference.VAD import VAD

from time import perf_counter

VAD = VAD.from_hparams(
    source="speechbrain/vad-crdnn-libriparty",
    savedir="pretrained_models/vad-crdnn-libriparty",
    huggingface_cache_dir="./",
    run_opts={"device": "cuda"}
)

s = perf_counter()
boundaries = VAD.get_speech_segments(
    "path_to_my_6h_audio.wav"
)
e = perf_counter()

# Print the output
VAD.save_boundaries(boundaries)
print(f"It took {e-s}")

Environment Details

audioread==3.0.1
certifi==2024.7.4
cffi==1.16.0
charset-normalizer==3.3.2
colorama==0.4.6
decorator==5.1.1
filelock==3.15.4
fsspec==2024.6.1
huggingface-hub==0.24.5
HyperPyYAML==1.2.2
idna==3.7
Jinja2==3.1.4
joblib==1.4.2
lazy_loader==0.4
librosa==0.10.2.post1
llvmlite==0.43.0
MarkupSafe==2.1.5
mpmath==1.3.0
msgpack==1.0.8
networkx==3.3
numba==0.60.0
numpy==1.26.4
packaging==24.1
pillow==10.2.0
platformdirs==4.2.2
pooch==1.8.2
pycparser==2.22
PyYAML==6.0.1
requests==2.32.3
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scikit-learn==1.5.1
scipy==1.14.0
sentencepiece==0.2.0
soundfile==0.12.1
soxr==0.4.0
speechbrain==1.0.0
sympy==1.13.1
threadpoolctl==3.5.0
torch==2.4.0+cu121
torchaudio==2.4.0
torchvision==0.19.0+cu121
tqdm==4.66.4
typing_extensions==4.12.2
urllib3==2.2.2

Relevant Log Output

No response

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions