diff --git a/about.html b/about.html index 1d408a7..06cbaf3 100644 --- a/about.html +++ b/about.html @@ -4,11 +4,8 @@ - - - - - SpeechBrain + + About SpeechBrain @@ -23,40 +20,33 @@ - -
- -
+ +
+ +
@@ -73,112 +63,97 @@

A community toolkit

- +
-
-
-

Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

-
-

Community driven

-

SpeechBrain has been designed to help researchers and developers. - The future of SpeechBrain highly depends on this community engagement. - Anyone can clone the code, develop new functionalities, raise issues, or create a new discussion. - With SpeechBrain we hope to start a symbiotic relationship between the toolkit and the research community.

+

🌐 Community driven

+

SpeechBrain has a large community of contributors. Anyone can contribute by cloning the code, developing new features, reporting issues, or joining discussions. +

-

Contributing to SpeechBrain

-

Contributions can take various forms (see our dedicated page):
- 1. Researchers can propose and implement new functionalities through pull requests. To ensure high-quality standards, each new pull request will be reviewed by a core team member.
- 2. Sponsors can contribute financially or with human resources by contacting us via email.

- SpeechBrain needs you in order to evolve! -

- +

💡 Contributing to SpeechBrain

+

+ Contributions to SpeechBrain come in various forms. See our dedicated page for details:

+ 1. Researchers: Propose and implement new functionalities through pull requests. Each new pull request undergoes review by a core team member to maintain high-quality standards.

+ + 2. Sponsors: Contribute financially or with human resources by reaching out to us via email. Also, check out our 2024 call for sponsors.

+ SpeechBrain needs your support to evolve! +

-

Referencing SpeechBrain

+

📖 Reference

                   
-  @misc{SB2021,
-    author = {Ravanelli, Mirco and Parcollet, Titouan and Rouhe, Aku and Plantinga, Peter and Rastorgueva, Elena and Lugosch, Loren and Dawalatabad, Nauman and Ju-Chieh, Chou and Heba, Abdel and Grondin, Francois and Aris, William and Liao, Chien-Feng and Cornell, Samuele and Yeh, Sung-Lin and Na, Hwidong and Gao, Yan and Fu, Szu-Wei and Subakan, Cem and De Mori, Renato and Bengio, Yoshua },
-    title = {SpeechBrain},
-    year = {2021},
-    publisher = {GitHub},
-    journal = {GitHub repository},
-    howpublished = {\url{https://github.com/speechbrain/speechbrain}},
-  }
+@misc{speechbrain,
+  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
+  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
+  year={2021},
+  eprint={2106.04624},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS},
+  note={arXiv:2106.04624}
+}
                   
                 
-

License

+

📜 License

SpeechBrain is released under the Apache license, version 2.0. The Apache license is a popular BSD-like license. SpeechBrain can be redistributed for free, even for commercial purposes, although you can not take off the license headers (and under some circumstances you may have to distribute a license document). Apache is not a viral license like the GPL, which forces you to release your modifications to the source code. Also note that this project has no connection to the Apache Foundation, other than that we use the same license terms.


From a legal and copyright point of view, SpeechBrain has been created by Dr. Mirco Ravanelli and Dr. Titouan Parcollet. There is no legal institution associated as an owner of SpeechBrain.

-

Sponsors

-

- Sponsoring allows us to further expand the SpeechBrain team highly increasing the number of new features coming out. It also helps to ensure high quality standards by being able to properly manage the various issues and pull requests coming from the community. - If interested do not hesitate to contact us via e-mail . -

+

🎗️ Sponsors

+

+ Sponsoring allows us to keep expanding and maintaining SpeechBrain. + If interested in a sponsorship, Check out the call for sponsors 2023!. Do not hesitate to contact us via e-mail . +

+
+
+
+
baidu
+
hf
+
+
ovh
+
nle
+
+
lia
+
+
+
+
+

Previous Sponsors

-
-

Partners

-

Partners are organizations that dedicate important human (or hardware resources) to the SpeechBrain project. - They are involved in the decision process by participating in meetings.

+

Partners are organizations that dedicate important human (or hardware resources) to the SpeechBrain project. + They are involved in the decision process by participating in meetings.

+
+ +
+
-
-

Contributors

-

In this section we thanks all the peoples that contributed to SpeechBrain.

-
- -
diff --git a/benchmarks.html b/benchmarks.html new file mode 100644 index 0000000..8401e3b --- /dev/null +++ b/benchmarks.html @@ -0,0 +1,175 @@ + + + + + + + + How to Contribute + + + + + + + + + + + + + + + +
+ +
+ + + +
+ +
+ + + +
+
+

📊 Available Benchmarks

+

SpeechBrain aims to promote transparent and reproducible research.

+To advance this mission, we develop benchmarks that help researchers conduct fair, robust, and standardized performance comparisons. We have created a dedicated repository for this purpose.

+The following benchmarks are currently available: +

+
+
+ + SpeechBrain-MOABB Logo + +

SpeechBrain-MOABB is an open-source Python library for benchmarking deep neural networks applied to EEG signals.

This repository provides a set of recipes for processing electroencephalographic (EEG) signals based on the popular Mother of all BCI Benchmarks (MOABB) , seamlessly integrated with SpeechBrain. +

+This package facilitates the integration and evaluation of new algorithms (e.g., a novel deep learning architecture or a novel data augmentation strategy) in standardized EEG decoding pipelines based on MOABB-supported tasks, i.e., motor imagery (MI), P300, and steady-state visual evoked potential (SSVEP). +

+ Reference Papers: +

+ + Davide Borra, Francesco Paissan, and Mirco Ravanelli. SpeechBrain-MOABB: An open-source Python library for benchmarking deep neural networks applied to EEG signals. Computers in Biology and Medicine, Volume 182, 2024. [Paper] +

+ Davide Borra, Elisa Magosso, and Mirco Ravanelli. Neural Networks, Page 106847, 2024. [Paper] +

+


+ + + DASB Logo + +

DASB - Discrete Audio and Speech Benchmark is a benchmark for evaluating discrete audio representations using popular audio tokenizers like EnCodec, DAC, and many more, integrated with SpeechBrain. +

+The package helps integrate and evaluate new audio tokenizers in speech tasks of great interest such as speech recognition,  speaker identificationemotion recognitionkeyword spottingintent classificationspeech enhancementseparation, text-to-speech, and many more. +

+It offers an interface for easy model integration and testing and a protocol for comparing different audio tokenizers. +

+ Reference Paper: +

+ Pooneh Mousavi, Luca Della Libera, Jarod Duret, Arten Ploujnikov, Cem Subakan, Mirco Ravanelli, + DASB - Discrete Audio and Speech Benchmark, 2024 + arXiv preprint arXiv:2406.14294. +[Paper] +

+


+ +

CL-MASR: Continual Learning for Multilingual ASR

+

CL-MASR +is a Continual Learning Benchmark for Multilingual ASR. +

+It includes scripts to train Whisper and WavLM-based ASR systems on a subset of 20 languages selected from Common Voice 13 in a continual learning fashion using a handful of methods including rehearsal-based, architecture-based, and regularization-based approaches. +

+The goal is to continually learn new languages while limiting forgetting the previously learned ones. + +

+An ideal method should achieve both positive forward transfer (i.e. improve performance on new tasks leveraging shared knowledge from previous tasks) and positive backward transfer (i.e. improve performance on previous tasks leveraging shared knowledge from new tasks). +

+Reference Paper: +

+ +Luca Della Libera, Pooneh Mousavi, Salah Zaiem, Cem Subakan, Mirco Ravanelli, (2024). CL-MASR: A continual learning benchmark for multilingual ASR. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 32, 4931–4944. +[Paper] +

+


+ + +

MP23: Multi-probe Speech Self Supervision Benchmark

+

MP23 - Multi-probe Speech Self Supervision Benchmark aims to evaluate self-supervised representations on various downstream tasks, including ASR, speaker verification, emotion recognition, and intent classification. +

+The key feature of this benchmark is that it allows users to choose their desired probing head for downstream training. +

+This is why we called it the Multi-probe Speech Self Supervision Benchmark (MP3S). It has been demonstrated that the performance of the model is greatly influenced by this selection +

+ Reference Papers: +

+ +Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli, (2023). Speech Self-Supervised Representation Benchmarking: Are We Doing it Right? Proceedings of Interspeech 2023 +[Paper] +

+Salah Zaiem, Youcef Kemiche, Titouan Parcollet, Slim Essid, Mirco Ravanelli, (2023). Speech self-supervised representations benchmarking: a case for larger probing heads. Computer Speech & Language, 89, 101695.
+[Paper] +

+


+ + +
+ + + + + + + + + + + + + + + + + + + + + + + + diff --git a/contributing.html b/contributing.html index 1f462cc..33aed4b 100644 --- a/contributing.html +++ b/contributing.html @@ -4,11 +4,8 @@ - - - - - SpeechBrain + + How to Contribute @@ -24,39 +21,32 @@ -
- -
+
+ +
@@ -66,86 +56,107 @@

Contributing

-

Everyone is welcome

+

Everyone is welcome!

- +
-
-
-

Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

-
-

Contributing to the code of SpeechBrain

-

The goal is to write a set of libraries that process audio and speech in several ways. - It is crucial to write a set of homogeneous libraries that are all compliant to a set of guidelines all described - in our documentation . +

🚀 Contributing to SpeechBrain

+

The goal is to collectively write a set of open-source libraries for Conversational AI. + It is crucial to write a set of homogeneous libraries that are all compliant with a set of guidelines described + in our documentation .

-

Zen of SpeechBrain

-

SpeechBrain could be used for research, academic, commercial, non-commercial purposes. - Ideally, the code should have the following features:


- -

Simple: the code must be easy to understand even by students or by users that are not professional programmers or speech researchers. - Try to design your code such that it can be easily read. Given alternatives with the same level of performance, code the simplest one (i.e the most explicit manner is preferred).

- Readable: SpeechBrain mostly adopts the code style conventions of PEP8. The code written by the users must be compliant with that.

- Efficient: The code should be as efficient as possible. Contributors should maximize the use of pytorch native operations. Test the code carefully with your favorite profiler (e.g, torch.utils.bottleneck https://pytorch.org/docs/stable/bottleneck.html ) to make sure there are no bottlenecks in your code. Since we are not working in c++ directly, the speed can be an issue.

- Modular: Write your code such that it is very modular and fits well with the other functionalities of the toolkit. The idea is to develop a bunch of models that can be naturally interconnected with each other.

- Well documented: Given the goals of SpeechBrain, writing rich and good documentation is a crucial step. -

-

How to get my code into SpeechBrain?

-

SpeechBrain is hosted via GitHub . Hence, the process of integrating your code - to the toolkit will be done via this plateform. Then, three steps can be followed:

- 1. Fork, clone the repository and install our test suite as detailled in the documentation .
- 2. Add your code and make sure that the tests still run properly. Commit your changes to your fork with our pre-commit tests to ensure the normalisation of your code. +
+

🌟 Zen of SpeechBrain

+

SpeechBrain could be used for research, academic, commercial, non-commercial purposes.If you want to contribute, keep in mind the following features: +

+ Simplicity: the code must be easy to understand even by students or users that are not professional programmers or speech researchers. + Design your code such that it can be easily read. Given alternatives with the same level of performance, code the simplest one.

+ Modularity: Write your code to be modular and well-fitting with the other functionalities of the toolkit. The idea is to develop a bunch of models that can be naturally interconnected with each other.

+ Efficiency: The code should be as efficient as possible. Contributors should maximize the use of pytorch native operations +

+ Documentation: Given the goals of SpeechBrain, writing rich and good documentation is a crucial step. Write docstrings with runnable examples (as done in PyTorch code). +

+

🔧 How to get my code into SpeechBrain?

+

SpeechBrain is hosted via GitHub . Contributing requires three steps:

+ 1. Fork, clone the repository and install our test suite as detailed in the documentation .
+ 2. Write your code and test it properly. Commit your changes to your fork with our pre-commit tests to ensure tests are passing. Then open a pull request on the official repository.
3. Participate in the review process. Each pull request is reviewed by one or two reviewers. - Please integrate their feedbacks into your code. Once reviewers are happy with your pull request, they will merge it into the official code.

- Details about this process (i.e including steps for installating the tests) are given in the documentation . + Please integrate their feedback into your code. Once reviewers are happy with your pull request, they will merge it into the official code.

+ Details about this process (i.e including steps for installing the tests) are given in the documentation .

-

How can I help?

-

Interractions between speech technologies and deep learning are various and numerous. - Therefore, we do not provide any official development directions. Instead, we believe that the toolkit will evolve accordingly to the needs - of the different research fields and industry. Examples of contributions include new recipes, better models for higher performance, - new external functionalities, or even core changes and extensions. While we do not provide any instructions on the potential interesting applications, - feel free to jump into our Discourse or GitHub to see if any existing issue remains unsolved! +

+

🙌 How can I help?

+

Examples of contributions include new recipes, new models, new external functionalities, solving issues/bugs.

-
-

Contributors

-

In this section we thanks all the peoples that contributed to SpeechBrain.

-
-
+
  • Aku Rouhe, Aalto University (FI)
  • +
  • Adel Moumen, University of Cambridge (UK)
  • +
  • Sylvain de Langen, Avignon Université (LIA, FR)
  • +
  • Cem Subakan, Mila, Laval University (CA)
  • +
  • Luca Della Libera, Concordia University, Mila (CA)
  • +
  • Pooneh Mousavi, Concordia University, Mila (CA)
  • +
  • Artem Ploujnikov, Université de Montréal, Mila (CA)
  • +
  • Davide Borra, University of Bologna (IT)
  • +
  • Francesco Paissan, Fondazione Bruno Kessler (IT)
  • +
  • Mahed Mousavi, University of Trento (IT)
  • +
  • Salah Zaiem, Telecom Paris (FR)
  • +
  • Zeyu Zhao, University of Edinburgh (UK)
  • +
  • Pierre Champion, INRIA (FR)
  • +
  • Georgios Karakasidis, University of Edinburgh (UK)
  • +
  • Sung-Lin Yeh, University of Edinburgh (UK)
  • +
  • Yingzhi Wang, Zaion (FR)
  • +
  • Dongwon Kim, Krafton AI (KR)
  • +
  • Xuechen Liu, Aalto University (FI)
  • +
  • Andreas Nautsch, Avignon Université (LIA, FR)
  • +
  • Pradnya Kandarkar, Concordia University
  • +
  • Jarod Duret, Avignon Université (LIA, FR)
  • +
  • Sangeet Sagar, Saarland University (GE)
  • +
  • Gaëlle Laperrière, Avignon Université (LIA, FR)
  • +
  • Ha Nguyen, Oracle (FR)
  • +
  • Pablo Zuluaga, École Polytechnique Fédérale de Lausanne (EPFL, CH)
  • +
  • Florian Mai, École Polytechnique Fédérale de Lausanne (EPFL, CH)
  • +
  • Loren Lugosch, Mila, McGill University (CA)
  • +
  • Nauman Dawalatabad, Indian Institute of Technology Madras (IN)
  • +
  • Ju-Chieh Chou, National Taiwan University (TW)
  • +
  • Abdel Heba, Linagora / University of Toulouse (IRIT, FR)
  • +
  • Francois Grondin, University of Sherbrooke (CA)
  • +
  • William Aris, University of Sherbrooke (CA)
  • +
  • Chien-Feng Liao, National Taiwan University (TW)
  • +
  • Samuele Cornell, Università Politecnica delle Marche (IT)
  • +
  • Sung-Lin Yeh, National Tsing Hua University (TW)
  • +
  • Hwidong Na, Visiting Researcher Samsung SAIL (CA)
  • +
  • Yan Gao, University of Cambridge (UK)
  • +
  • Szu-Wei Fu, Academia Sinica (TW)
  • +
  • Elena Rastorgueva, University of Cambridge (UK)
  • +
  • Jianyuan Zhong, University of Rochester (USA)
  • +
  • Brecht Desplanques, Ghent University (BE)
  • +
  • Jenthe Thienpondt, Ghent University (BE)
  • +
  • Salima Mdhaffar, Avignon Université (LIA, FR)
  • +
  • Mickael Rouvier, Avignon University (LIA, FR)
  • +
  • Yannick Estève, Avignon University (LIA, FR)
  • +
  • Renato De Mori, McGill University (CA), Avignon University (LIA, FR)
  • +
  • Yoshua Bengio, Mila, University of Montréal (CA)
  • + +
    diff --git a/css/responsive.css b/css/responsive.css index da339d3..a070200 100644 --- a/css/responsive.css +++ b/css/responsive.css @@ -277,9 +277,6 @@ .blog_banner .banner_inner .blog_b_text { margin-top: 0px; } - .home_banner_area .banner_inner .banner_content img{ - display: none; - } .home_banner_area .banner_inner .banner_content h5 { margin-top: 0px; } diff --git a/css/style.css b/css/style.css index 5bebcdc..42b5731 100644 --- a/css/style.css +++ b/css/style.css @@ -361,7 +361,7 @@ button:focus { .home_banner_area .banner_inner { width: 100%; } - .home_banner_area .banner_inner .home_left_img { + .home_banner_area .banner_inner { padding-top: 230px; padding-bottom: 230px; } .home_banner_area .banner_inner .col-lg-7 { @@ -472,6 +472,43 @@ button:focus { .banner_area .banner_inner .banner_content .page_link a:last-child:before { display: none; } +.discord-btn { + background-color: #5865F2; + color: white; + padding: 6px 12px; + border-radius: 6px; + text-decoration: none; + font-size: 14px; + font-weight: 500; + display: inline-flex; + align-items: center; + height: 30px; + box-sizing: border-box; + transition: all 0.3s ease; +} + +.discord-btn:hover { + background-color: #4752C4; + transform: translateY(-1px); + box-shadow: 0 4px 8px rgba(88, 101, 242, 0.3); + color: white; + text-decoration: none; +} + +.discord-btn:active { + transform: translateY(0); + box-shadow: 0 2px 4px rgba(88, 101, 242, 0.2); +} + +.discord-btn svg { + margin-right: 6px; + transition: transform 0.2s ease; +} + +.discord-btn:hover svg { + transform: scale(1.1); +} + /* End Home Banner Area css ============================================================================================ */ /*---------------------------------------------------- */ @@ -899,6 +936,86 @@ button:focus { background-color: #3bacf0; border-color: #eee; } +.tabulation { + display: inline-block; + width: 4ch; +} + +.speaker-cards { + display: flex; + flex-wrap: wrap; + justify-content: center; + gap: 20px; + margin-top: 50px; +} + +.speaker-card { + display: flex; + flex-direction: column; + align-items: center; + width: 550px; + border: 1px solid #ccc; + padding: 20px; + box-shadow: 0px 0px 5px rgba(0, 0, 0, 0.2); + border-radius: 5px; +} + +.speaker-image { + width: 100%; + max-width: 200px; + height: 200px; + border-radius: 50%; + overflow: hidden; + margin-bottom: 20px; +} + + +.speaker-image img { + width: 100%; + height: 100%; + object-fit: cover; +} + +.speaker-info { + text-align: left; + justify-content: left; +} + +.speaker-info h3 { + font-size: 1.5rem; + margin-bottom: 10px; + color: black; +} + +.speaker-info p { + font-size: 1rem; + line-height: 1.5; + text-align: justify; +} + + +.speaker-info .title { + color: #38a4ff; +} + +.speaker-info h3 { + color: black; +} + +.speaker-info h4 { + color: black; +} + +.speaker-info p { + color: black; +} + +.speaker-info i { + font-style: italic; +} + + + /*============ Start Blog Single Styles =============*/ .single-post-area .social-links { padding-top: 10px; } diff --git a/img/Call_for_Sponsors_Sept_2021.pdf b/img/Call_for_Sponsors_Sept_2021.pdf new file mode 100644 index 0000000..d410b1f Binary files /dev/null and b/img/Call_for_Sponsors_Sept_2021.pdf differ diff --git a/img/benchmarks/DASB_logo.png b/img/benchmarks/DASB_logo.png new file mode 100644 index 0000000..97a5d1d Binary files /dev/null and b/img/benchmarks/DASB_logo.png differ diff --git a/img/benchmarks/ben.txt b/img/benchmarks/ben.txt new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/img/benchmarks/ben.txt @@ -0,0 +1 @@ + diff --git a/img/benchmarks/sb-moabb-logo.svg b/img/benchmarks/sb-moabb-logo.svg new file mode 100644 index 0000000..091a60c --- /dev/null +++ b/img/benchmarks/sb-moabb-logo.svg @@ -0,0 +1,773 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + MOABB + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/img/favicon.ico b/img/favicon.ico new file mode 100644 index 0000000..9e9ab10 Binary files /dev/null and b/img/favicon.ico differ diff --git a/img/logo_line.png b/img/logo_line.png deleted file mode 100644 index 07da095..0000000 Binary files a/img/logo_line.png and /dev/null differ diff --git a/img/logo_line_big.png b/img/logo_line_big.png deleted file mode 100644 index c4f661c..0000000 Binary files a/img/logo_line_big.png and /dev/null differ diff --git a/img/logo_line_verysmall.png b/img/logo_line_verysmall.png deleted file mode 100644 index 6c8e533..0000000 Binary files a/img/logo_line_verysmall.png and /dev/null differ diff --git a/img/logo_noname_rounded_small.png b/img/logo_noname_rounded_small.png deleted file mode 100644 index 02ff38f..0000000 Binary files a/img/logo_noname_rounded_small.png and /dev/null differ diff --git a/img/logo_noname_rounded_very_small.png b/img/logo_noname_rounded_very_small.png deleted file mode 100644 index 575f51b..0000000 Binary files a/img/logo_noname_rounded_very_small.png and /dev/null differ diff --git a/img/logo_small_line.png b/img/logo_small_line.png deleted file mode 100644 index 73b7c5f..0000000 Binary files a/img/logo_small_line.png and /dev/null differ diff --git a/img/partners/logo_selma.png b/img/partners/logo_selma.png new file mode 100644 index 0000000..53fd93d Binary files /dev/null and b/img/partners/logo_selma.png differ diff --git a/img/speechbrain-horiz-logo.svg b/img/speechbrain-horiz-logo.svg new file mode 100644 index 0000000..71f81eb --- /dev/null +++ b/img/speechbrain-horiz-logo.svg @@ -0,0 +1,142 @@ + + + + + + + + + + + + + + + + + + + + + diff --git a/img/speechbrain-logo-brain-banner.png b/img/speechbrain-logo-brain-banner.png new file mode 100644 index 0000000..e32b51d Binary files /dev/null and b/img/speechbrain-logo-brain-banner.png differ diff --git a/img/speechbrain-round-logo.svg b/img/speechbrain-round-logo.svg new file mode 100644 index 0000000..6c8177d --- /dev/null +++ b/img/speechbrain-round-logo.svg @@ -0,0 +1,16 @@ + + + + + + + + + + + + + + + + diff --git a/img/sponsors/concordia.png b/img/sponsors/concordia.png new file mode 100644 index 0000000..2a27ff1 Binary files /dev/null and b/img/sponsors/concordia.png differ diff --git a/img/sponsors/logo_badu.png b/img/sponsors/logo_badu.png new file mode 100644 index 0000000..04f1a4e Binary files /dev/null and b/img/sponsors/logo_badu.png differ diff --git a/img/sponsors/logo_nle.png b/img/sponsors/logo_nle.png new file mode 100644 index 0000000..93a5bf5 Binary files /dev/null and b/img/sponsors/logo_nle.png differ diff --git a/img/sponsors/logo_ovh.png b/img/sponsors/logo_ovh.png new file mode 100644 index 0000000..eb74dd0 Binary files /dev/null and b/img/sponsors/logo_ovh.png differ diff --git a/img/summit/Photo_ArianeNabethHalber_Mars2019_11 - copie.png b/img/summit/Photo_ArianeNabethHalber_Mars2019_11 - copie.png new file mode 100644 index 0000000..f08006f Binary files /dev/null and b/img/summit/Photo_ArianeNabethHalber_Mars2019_11 - copie.png differ diff --git a/img/summit/adel_moumen_2023.png b/img/summit/adel_moumen_2023.png new file mode 100644 index 0000000..5cb9088 Binary files /dev/null and b/img/summit/adel_moumen_2023.png differ diff --git a/img/summit/andreas.png b/img/summit/andreas.png new file mode 100644 index 0000000..95d542a Binary files /dev/null and b/img/summit/andreas.png differ diff --git a/img/summit/arsenii.png b/img/summit/arsenii.png new file mode 100644 index 0000000..b687b71 Binary files /dev/null and b/img/summit/arsenii.png differ diff --git a/img/summit/brianmcfee.jpg b/img/summit/brianmcfee.jpg new file mode 100644 index 0000000..33f14b4 Binary files /dev/null and b/img/summit/brianmcfee.jpg differ diff --git a/img/summit/cem4_c.jpg b/img/summit/cem4_c.jpg new file mode 100644 index 0000000..e1c6748 Binary files /dev/null and b/img/summit/cem4_c.jpg differ diff --git a/img/summit/danpovey.jpg b/img/summit/danpovey.jpg new file mode 100644 index 0000000..967821f Binary files /dev/null and b/img/summit/danpovey.jpg differ diff --git a/img/summit/francois.jpeg b/img/summit/francois.jpeg new file mode 100644 index 0000000..438c204 Binary files /dev/null and b/img/summit/francois.jpeg differ diff --git a/img/summit/headshot-peter-plantinga.jpg b/img/summit/headshot-peter-plantinga.jpg new file mode 100644 index 0000000..22dee8a Binary files /dev/null and b/img/summit/headshot-peter-plantinga.jpg differ diff --git a/img/summit/photo_mirco_ravanelli.jpg b/img/summit/photo_mirco_ravanelli.jpg new file mode 100644 index 0000000..ef28f82 Binary files /dev/null and b/img/summit/photo_mirco_ravanelli.jpg differ diff --git a/img/summit/sanchit.jpeg b/img/summit/sanchit.jpeg new file mode 100644 index 0000000..f01e860 Binary files /dev/null and b/img/summit/sanchit.jpeg differ diff --git a/img/summit/shinji.jpeg b/img/summit/shinji.jpeg new file mode 100644 index 0000000..d94d59f Binary files /dev/null and b/img/summit/shinji.jpeg differ diff --git a/img/summit/titouan_parcollet.jpg b/img/summit/titouan_parcollet.jpg new file mode 100644 index 0000000..636be50 Binary files /dev/null and b/img/summit/titouan_parcollet.jpg differ diff --git a/img/summit/vielzeuf.jpg b/img/summit/vielzeuf.jpg new file mode 100644 index 0000000..28cdaae Binary files /dev/null and b/img/summit/vielzeuf.jpg differ diff --git a/img/summit/yan_gao.jpg b/img/summit/yan_gao.jpg new file mode 100644 index 0000000..4a3edea Binary files /dev/null and b/img/summit/yan_gao.jpg differ diff --git a/img/summit/yannick22.jpeg b/img/summit/yannick22.jpeg new file mode 100644 index 0000000..ba7e131 Binary files /dev/null and b/img/summit/yannick22.jpeg differ diff --git a/img/summit/zhaoheng.png b/img/summit/zhaoheng.png new file mode 100644 index 0000000..43a8fc8 Binary files /dev/null and b/img/summit/zhaoheng.png differ diff --git a/img/youtube.svg b/img/youtube.svg new file mode 100644 index 0000000..18198c3 --- /dev/null +++ b/img/youtube.svg @@ -0,0 +1,11 @@ + + + + + + + \ No newline at end of file diff --git a/index.html b/index.html index 3f5b438..858a139 100644 --- a/index.html +++ b/index.html @@ -5,12 +5,8 @@ - - - - - - SpeechBrain + + SpeechBrain: Open-Source Conversational AI for Everyone @@ -23,6 +19,17 @@ + + + + + @@ -30,35 +37,28 @@
    @@ -71,94 +71,100 @@
    - +
    - +
    +
    +
    + + + + + + + + Discord + +
    +
    +
    + - + + +
    -
    +

    Key Features

    -

    SpeechBrain is an open-source and all-in-one speech toolkit. - It is designed to be simple, extremely flexible, and user-friendly. State-of-the-art performance are obtained in various domains.

    +

    Open, simple, flexible, well-documented, and with competitive performance.

    -

    Speech Recognition

    -

    SpeechBrain supports state-of-the-art methods for end-to-end - speech recognition, including models based on CTC, CTC+attention, transducers, transformers, - and neural language models relying on recurrent neural networks and transformers.

    +

    Speech

    +

    SpeechBrain supports state-of-the-art technologies for speech recognition, enhancement, separation, text-to-speech, speaker recognition, speech-to-speech translation, spoken language understanding, and beyond.

    - -

    Speaker Recognition

    -

    Speaker recognition is already deployed in a wide variety of realistic applications. - SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, contrastive learning -

    + +

    Audio

    +

    SpeechBrain encompasses a wide range of audio technologies, including vocoding, audio augmentation, feature extraction, sound event detection, beamforming, and other multi-microphone signal processing capabilities.

    - -

    Speech Enhancement

    -

    Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. - Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well.

    -
    -
    -
    -
    - -

    Speech Processing

    -

    SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction, normalisation - that can be used on-the-fly during your experiment. -

    + +

    Text

    +

    SpeechBrain offers user-friendly tools for training Language Models, supporting technologies ranging from basic n-gram LMs to modern Large Language Models. Our platform seamlessly integrates them into speech processing pipelines and facilitates the creation of customizable chatbots.

    -
    -
    - -

    Multi Microphone Processing

    -

    Combining multiple microphones is a powerful approach to achieve robustness in adverse acoustic environments. - SpeechBrain provides various techniques for beamforming (e.g, delay-and-sum, MVDR, and GeV) and speaker localization. -

    -
    -
    -
    +
    +
    + +

    Technology

    +

    SpeechBrain leverages the most advanced deep learning technologies, including methods for self-supervised learning, continual learning, diffusion models, Bayesian deep learning, and interpretable neural networks.

    +
    +
    +
    + +
    - +

    Research & Development

    -

    SpeechBrain is designed to speed-up research and development of speech technologies. - It is modular, flexible, easy-to-customize, and contains several recipes for popular datasets. Documentation and tutorials are here to - help newcomers using SpeechBrain.

    +

    SpeechBrain is engineered to accelerate the research and development of Conversational AI technologies. It comes with pre-built recipes for popular datasets. Extensive documentation and tutorials are available to support newcomers.

    HuggingFace!

    -

    SpeechBrain provides multiple pre-trained models that can - easily be deployed with nicely designed interfaces. - Transcribing, verifying speakers, enhancing speech, separating sources have never been that easy!

    +

    SpeechBrain offers pre-trained models with user-friendly interfaces, making tasks like transcription, speaker verification, speech enhancement, and source separation easier than ever.

    @@ -176,9 +182,6 @@

    Key Features

    Why SpeechBrain?

    -

    SpeechBrain allows you to easily and quickly customize any part of your speech pipeline - ranging from the data management up to the downstream task metric.
    - No existing speech toolkit provides such a level of accessibility.

    diff --git a/sb_summit2023.html b/sb_summit2023.html new file mode 100644 index 0000000..059f948 --- /dev/null +++ b/sb_summit2023.html @@ -0,0 +1,444 @@ + + + + + + + + About SpeechBrain + + + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    + + + +
    +
    +

    Let’s Talk About Speech Technologies

    + +

    Official recordings are available at this link!

    + +

    + SpeechBrain has established itself as a leading deep learning toolkit + for speech processing in recent years, with impressive usage statistics + to back it up. With an average of 100 daily clones and 1000 daily + downloads on its GitHub repository, along with over 6,000 stars and 1100 + forks, SpeechBrain is a popular choice among speech processing + experts. +

    +
    +

    + In this summit, we are excited to share the latest developments and + updates on the SpeechBrain project, engage in an open and collaborative + discussion with the community, and introduce it to a broader audience of + speech professionals. We would like participants to stay up-to-date with + the latest advancements made in SpeechBrain and speech processing + technology. We also wish to gather, more interactively, the feedback + from the community to better plan future developments. The event will + take place four days after the main conference on August 28th. +

    +
    + +

    We are passionate about organizing this event for several reasons:

    + The field of speech technology has seen tremendous growth in recent + years. Our goal is to keep the community informed about the latest + developments and future plans for the SpeechBrain project. We also aim + to engage in an open dialogue with the community to set ambitious goals + for the project’s future.

    + We are excited to bring together experts from both industry and + academia to showcase their impactful projects and share their knowledge + with the community.

    + This event is not only an opportunity to learn and stay updated, but also to network and connect with like-minded individuals in the SpeechBrain community. + The core idea is to avoid missing this chance to be a part of shaping the future of speech technologies and building valuable connections within the community. +

    +
    +

    Tentative Schedule

    +

    The event will start at 9am Eastern Daylight Time (EDT) on August 28th and, we will have two sessions with a break from 11.30 am - 12.00 pm.

    + Morning (9.00 am - 11.30 am):
    + 9.00 am    - 9.30    am Opening and thanks to Sponsors
    + 9.30 am    - 10.00 amIndustry Talk 1: Arsenii Gorin (UbenwaAI)
    + 10.00 am - 10.30 amAcademic Talk 1: Yannick Estève (Université Avignon)
    + 10.30 am - 11.00 amIndustry Talk 2: Ariane Nabeth Halber (ViaDialog)
    + 11.00 am - 11.30 amAcademic Talk 2: Yan Gao (University of Cambridge)

    + Lunch Break (11.30 am - 12.00 pm)

    + Afternoon (12.00 pm - 4.30 pm):
    + 12.00 pm - 1.00 pmSpeechBrain Roadmap 2023 & latest updates
    + 1.00 pm - 1.30 pmIndustry Talk 3: Peter Plantinga (JP Morgan Chase)
    + 1.30 pm - 2.00 pmAcademic Talk 4: Vielzeuf Valentin (Orange Labs)
    + 2.00 pm - 2.30 pmCoffee Break
    + 2.30 pm - 4.15 pmPanel Discussion and Q&A:
    Shinji Watanabe, Dan Povey, Brian McFee, Sanchit Gandhi, Zhaoheng Ni
    + 4.15 pm - 4.30 pmFinal Remarks and Closing
    +

    + +
    +

    Invited Speakers

    +
    +
    +
    +
    + +
    +
    +

    Peter Plantinga

    +

    JP Morgan Chase & Co

    +

    Continual Learning for End-to-End ASR by Averaging Domain Experts

    +

    + Peter Plantinga is an Applied AI/ML Associate at the Machine Learning Center of Excellence at JP Morgan Chase & Co. He received his PhD in computer science in 2021 from the Ohio State University (USA) under Prof. Eric Fosler-Lussier focusing on knowledge transfer for the tasks of speech enhancement, robust ASR, and reading verification. His current work involves adapting large-scale ASR models to the financial domain without forgetting, as well as better evaluations of ASR models. +

    +
    +
    +
    +
    + +
    +
    +

    Ariane Nabeth Halber

    +

    ViaDialog

    +

    How ViaDialog sponsors speechbrain and brings hyperlarge vocabulary speech technologies to the contact centres

    +

    + Ariane Nabeth-Halber has been working in the speech industry for 25 years. She started her career in research (ATR, Japan; Thalès, France), and then moved to the speech industry, namely with Nuance Communication, and French company Bertin IT, working there with contact centres, broadcasters, trading floors and public ministries, but also academic labs such as LIUM and Avignon University/LIA. Since August 2021, Ariane Nabeth-Halber leads the Speech and Conversational AI team at ViaDialog, to deliver efficient and safe customer relationship experiences. A European Commission expert and LT-Innovate board member, Ariane holds a PhD in computer science and signal processing from Telecom ParisTech. She regularly speaks at conferences on AI and speech technology. +

    +
    +
    +
    + +
    +
    +
    + +
    +
    +

    Arsenii Gorin

    +

    UbenwaAI

    +

    Deep learning for infant cry classification

    +

    + Arsenii is the lead ML research scientist Ubenwa. He obtained a PhD from Université de Lorraine working on Automatic Speech Recognition. His main research interests are practical applications of machine learning techniques for audio and speech processing. +

    +
    +
    +
    +
    + +
    +
    +

    Vielzeuf Valentin

    +

    Orange Labs

    +

    Speech Recognition Toolkits in Focus: Analyzing Speechbrain's Advantages and Drawbacks through some Orange’s Projects Examples

    +

    + Valentin VIelzeuf is currently a researcher at Orange, focusing on Speech Recognition, Spoken Language Understanding and Complexity Reduction. He holds a phD thesis on Multimodal Deep Learning and also has a background in Computer Vision. +

    +
    +
    +
    + + +
    +
    +
    + +
    +
    +

    Yan Gao

    +

    University of Cambridge

    +

    Federated self-supervised speech representation learning

    +

    Yan Gao is a final-year PhD student in Machine Learning System lab at the University of Cambridge, supervised by Prof Nicholas Lane. His research interests are in machine learning, deep learning, optimisation. His recent topics are in federated learning with self-supervised learning on audio and vision data.

    +
    +
    +
    +
    + +
    +
    +

    Yannick Estève

    +

    Avignon Université

    +

    Advancing research: some examples of the SpeechBrain's potential in the context of the LIAvignon partnership chair

    +

    + Yannick received the M.S. (1998) in computer science from the Aix-Marseilles University and the Ph.D. (2002) from Avignon University, France. + He joined Le Mans Université (LIUM lab) in 2003 as an associate professor, and became a full professor in 2010. He moved to Avignon University in 2019 and is the head of the Computer Science Laboratory of Avignon (LIA) since 2020. He has authored and co-authored more than 150 journal and conference papers in speech and language processing.

    +
    +
    +
    + + +
    +
    +
    +

    Invited Panel Discussions

    +
    + +
    +
    +
    + +
    +
    +

    Shinji Watanabe

    +

    ESPNet

    +

    + Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar at Georgia institute of technology, Atlanta, GA, in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Before Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD, USA, from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has published over 300 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He is a Senior Area Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP). He is an IEEE and ISCA Fellow.

    +
    +
    +
    +
    + +
    +
    +

    Dan Povey

    +

    Kaldi

    +

    + Daniel Povey is known for many different contributions to the technology of + speech recognition, including early innovations in sequence training such as + Minimum Phone Error, for the Kaldi toolkit, for "next-gen Kaldi" tools + k2/lhotse/Icefall/sherpa, and for the Librispeech dataset. + + He completed his PhD at Cambridge University in 2003, spent about + ten years working for industry research labs (IBM Research and then Microsoft + Research) and 7 years as non-tenure-track faculty at Johns Hopkins University. + He moved to Beijing, China in November 2019 to join Xiaomi Corporation as Chief + Voice Scientist. + + He is an IEEE Fellow as of 2023. +

    +
    +
    +
    + +
    +
    +
    + +
    +
    +

    Brian McFee

    +

    Librosa

    +

    + Brian McFee is Assistant Professor of Music Technology and Data Science New York University. + His work lies at the intersection of machine learning and audio analysis. + He is an active open source software developer, and the principal maintainer of the librosa package for audio analysis.

    +
    +
    +
    +
    + +
    +
    +

    Sanchit Gandhi

    +

    HuggingFace

    +

    + Sanchit Gandhi is an ML Engineer at Hugging Face. He leads the open-source audio team and maintains the audio models in the Transformers library, with the goal of making state-of-the-art speech recognition models more accessible to the community. Sanchit’s research interests lie in robust, generalisable speech recognition. Prior to working at Hugging Face, Sanchit completed his Masters’ Degree from the University of Cambridge. +

    +
    +
    +
    + +
    +
    +
    + +
    +
    +

    Zhaoheng Ni

    +

    Torchaudio

    +

    +Zhaoheng Ni is a research scientist in the PyTorch Audio team of Meta. He graduated from City University of New York supervised by Professor Michael Mandel then joined Meta AI as a research scientist in 2021. His research interests are single-channel and multi-channel speech enhancement, speech separation, and robust ASR.

    +
    +
    +
    + +
    +
    +
    +

    Organizing Commitee

    +
    + +
    +
    +
    + +
    +
    +

    Titouan Parcollet, SpeechBrain, Samsung AI Cambridge.

    +

    +Titouan is a Research Scientist at the Samsung AI Research center in Cambridge (UK) and a visiting scholar at the Cambridge Machine Learning Systems Lab from the University of Cambridge (UK). +Previously, he was an Associate Professor in computer science at the Laboratoire Informatique d’Avignon (LIA), from Avignon University (FR). +He also was a senior research associate at the University of Oxford (UK) within the Oxford Machine Learning Systems group. He received his PhD in +computer science from the University of Avignon (FR) and in partnership with Orkis focusing on quaternion neural networks, automatic speech recognition, +and representation learning. His current work involves efficient speech recognition, federated learning, and self-supervised learning. He is also currently +collaborating with the University of Montréal (Mila, QC, Canada) as the co-leader of the SpeechBrain project. +

    +
    +
    + + +
    +
    + +
    +
    +

    Cem Subakan, SpeechBrain, Université Laval

    +

    +Cem is an Assistant Professor at Université Laval in the Computer Science and Software Engineering department. +He is also currently an Affiliate Assistant Professor in the Concordia University Computer Science and Software Engineering Department, +and an invited researcher at Mila, Québec AI Institue. He received his PhD in Computer Science from University of Illinois at Urbana-Champaign (UIUC), +and did a postdoc in Mila Québec AI Institute and Université de Sherbrooke. He serves as reviewer in several conferences including Neurips, ICML, ICLR, ICASSP, MLSP and +journals such as IEEE Signal Processing Letters (SPL), IEEE Transactions on Audio, Speech, and Language Processing (TASL). His research interests include Deep learning for +Source Separation and Speech Enhancement under realistic conditions, Neural Network Interpretability, and Latent Variable Modeling. He is a recipient of best paper award +in the 2017 version IEEE Machine Learning for Signal Processing Conference (MLSP), as well as the Sabura Muroga Fellowship from the UIUC CS department. +He's a core contributor to the SpeechBrain project, leading the speech separation part. +

    +
    +
    +
    +
    + +
    +
    + +
    +
    +

    Adel Moumen, SpeechBrain, Université d'Avignon

    + +

    +Adel is a research engineer at University of Avignon (FR). He completed his Bachelor's degree in computer science with distinction in an innovation and research-devoted curriculum +and earned a two-year entrepreneurship diploma in 2022. Currently, Adel is participating in a master's apprenticeship program in computer science with specialization in AI, +where he professionally contributes to the development of SpeechBrain, an all-in-one, open-source, PyTorch-based speech processing toolkit. At SpeechBrain, he leads the efforts +of the automatic speech recognition community. +

    +
    +
    + +
    +
    + +
    +
    +

    Mirco Ravanelli, SpeechBrain, Concordia University

    +

    +Mirco is an assistant professor at Concordia University, an adjunct professor at Université de Montréal, and Mila associate member. +His main research interests are deep learning and Conversational AI. He is the author or co-author of more than 60 papers on these research topics. +He received his Ph.D. (with cum laude distinction) from the University of Trento in December 2017. +Mirco is an active member of the speech and machine learning communities. He is the founder and leader of the SpeechBrain project which aim to build an +open-source toolkit for conversational AI and speech processing. +

    +
    +
    +
    + + +
    +
    +
    + +
    +
    +

    François Grondin, SpeechBrain, Université de Sherbrooke

    +

    +François is an Assistant Professor at Université de Sherbrooke (CA) in the Department of Electrical Engineering and Computer Engineering since 2020. +He was a Postdoctoral Associate at the Computer Science & Artificial Intelligence Laboratory (CSAIL), at Massachusetts Institute of Technology (USA), from 2018 to 2020. +He received his PhD in electrical engineering from Université de Sherbrooke (CA) in 2017. He serves as a reviewer for multiple conferences including ICASSP, INTERSPEECH, RSJ/IROS, +ICRA and journals such as Transactions on Audio, Speech and Language Processing, EURASIP Journal on Audio, Speech, and Music Processing, IEEE Transactions on Robotics, IEEE Robotics and +Automation Letters, IEEE Robotics and Automation Letters, Transactions on Pattern Analysis and Machine Learning. His current work involves multichannel speech enhancement, +sound source localization, ego-noise suppression, sound classification, +robot audition and hybrid signal processing/machine learning approaches. He contributed to the multichannel processing tools in the SpeechBrain project. +

    +
    +
    + +
    +
    + +
    +
    +

    Andreas Nautsch, SpeechBrain, Université d'Avignon

    +

    +Andreas was research engineer at Université d'Avignon, where he co-maintained SpeechBrain readying it for its next major version release. +Andreas served as project editor for ISO/IEC 19794-13, co-organised editions of the VoicePrivacy and ASVspoof challenges, co-initiated the ISCA SIG on Security & Privacy in Speech Communication (SPSC), +was an Associate Editor for the EURASIP Journal on Audio, Speech, and Music Processing, and co-lead the 2021 Lorentz workshop on Speech as Personal Identifiable Information. By 2020, he lead multidisciplinary +publication teams composed of speech & language technologists, legal scholars, cryptographers, and biometric experts; Andreas co-responded on behalf of ISCA SIG-SPSC to the public consultation of the 2021 EDPB +guidelines on virtual voice assistants. In 2023, he joined goSmart to lead the solution development (architecture, design, and implementation) for private 5G based campus networks. +

    +
    +
    +
    + + + + +
    + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tutorial_advanced.html b/tutorial_advanced.html index 466dd2b..891ecf2 100644 --- a/tutorial_advanced.html +++ b/tutorial_advanced.html @@ -4,11 +4,12 @@ - - - - - SpeechBrain + + + + SpeechBrain Tutorials @@ -26,35 +27,28 @@
    @@ -62,155 +56,71 @@
    - +
    - +
    -
    -

    Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

    -
    -
    -
    -
    -
    - -
    -
    -
    -

    Pre-trained Models and Fine-Tuning with drawing

    -

    Training DNN models is often very time-consuming and expensive. - For this reason, whenever it is possible, using off-the-shelf pretrained - models can be convenient in various scenarios. - We provide a simple and straightforward way to download and instantiate a - state-of-the-art pretrained-model from drawing HuggingFace drawing and use it either for direct inference or - or fine-tuning/knowledge distillation or whatever new fancy technique you can come up with!

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Text Tokenizer

    -

    Machine Learning tasks that process text may contain thousands of vocabulary - words which leads to models dealing with huge embeddings as input/output - (e.g. for one-hot-vectors and ndim=vocabulary_size). This causes an important consumption of memory, - complexe computations, and more importantly, sub-optimal learning due to extremely sparse and cumbersome - one-hot vectors. In this tutorial, we provide all the basics needed to correctly use the SpeechBrain Tokenizer relying - on SentencePiece (BPE and unigram).

    - Open in Google Colab -
    -
    -
    -
    - -
    -
    -
    diff --git a/tutorial_asr.html b/tutorial_asr.html new file mode 100644 index 0000000..891ecf2 --- /dev/null +++ b/tutorial_asr.html @@ -0,0 +1,147 @@ + + + + + + + + + + SpeechBrain Tutorials + + + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    + + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tutorial_basics.html b/tutorial_basics.html index 7ad7eb5..891ecf2 100644 --- a/tutorial_basics.html +++ b/tutorial_basics.html @@ -4,11 +4,12 @@ - - - - - SpeechBrain + + + + SpeechBrain Tutorials @@ -26,35 +27,28 @@
    @@ -62,259 +56,71 @@
    - +
    - +
    -
    -

    Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

    -
    -
    -
    -
    - -
    - -
    -
    -
    -

    Brain Class

    -

    One key component of deep learning is iterating the dataset multiple times and performing parameter updates. - This process is sometimes called the "training loop" and there are usually many stages to this loop. - SpeechBrain provides a convenient framework for organizing the training loop, in the form of a class known as the "Brain" class, - implemented in speechbrain/core.py. In each recipe, we sub-class this class and override the methods for which the default - implementation doesn't do what is required for that particular recipe.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    HyperPyYAML

    -

    An essential part of any deep learning pipeline is the definition of hyperparameters and other metadata. - These data in conjunction with the deep learning algorithms control the various aspects of the pipeline, - such as model architecture, training, and decoding. At SpeechBrain, we decided that the distinction between - hyperparameters and learning algorithms ought to be evident in the structure of our toolkit, so we split our - recipes into two primary files: experiment.py and hyperparams.yaml. The hyperparams.yaml file is in a - SpeechBrain-developed format, which we call "HyperPyYAML". We chose to extend YAML since it is a highly - readable format for data serialization. By extending an already useful format, we were able to create an - expanded definition of hyperparameter, keeping our actual experimental code small and highly readable.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Data Loading Pipeline

    -

    Setting up an efficient data loading pipeline is often a tedious task which involves creating the examples, - defining your torch.utils.data.Dataset class as well as different data sampling and augmentations strategies. - In SpeechBrain we provide efficient abstractions to simplify this time-consuming process without sacrificing - flexibility. In fact our data pipeline is built around the Pytorch one.

    - Open in Google Colab -
    -
    -
    -
    - -
    - -
    -
    -
    -

    Multi-GPU Considerations

    -

    SpeechBrain provides two different methods to use multiple GPUs. - These solutions follow PyTorch standards and allow for intra- or cross-node training. In this tutorial, the use of Data Parallel (DP) and Distributed Data Parallel (DDP) within SpeechBrain are explained.

    - Open in Google Colab -
    -
    -
    -
    - -
    -
    -
    diff --git a/tutorial_classification.html b/tutorial_classification.html new file mode 100644 index 0000000..891ecf2 --- /dev/null +++ b/tutorial_classification.html @@ -0,0 +1,147 @@ + + + + + + + + + + SpeechBrain Tutorials + + + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    + + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tutorial_enhancement.html b/tutorial_enhancement.html new file mode 100644 index 0000000..891ecf2 --- /dev/null +++ b/tutorial_enhancement.html @@ -0,0 +1,147 @@ + + + + + + + + + + SpeechBrain Tutorials + + + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    + + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/tutorial_hmm.html b/tutorial_hmm.html deleted file mode 100644 index b4bf50b..0000000 --- a/tutorial_hmm.html +++ /dev/null @@ -1,240 +0,0 @@ - - - - - - - - - - - SpeechBrain - - - - - - - - - - - - - - - -
    - -
    - - - -
    - -
    - - - -
    - -
    - - - -
    -
    -
    -
    -
    -
    - -
    -
    -
    -

    Minimal Example

    -

    Duis faucibus consequat nisi, id dapibus velit tristique vitae. Donec cursus dolor nulla, eget iaculis libero elementum in. Curabitur porttitor suscipit velit, at imperdiet orci viverra id. Praesent vitae nisi convallis, bibendum purus sagittis, mattis ipsum. Fusce pulvinar mi sit amet lorem cursus ultricies vel sit amet velit. Quisque id aliquet purus. Integer vitae lectus ac lorem porttitor lobortis sed quis lorem. Quisque posuere non libero eu consequat. Curabitur porttitor lacus sed dolor dapibus posuere. Mauris luctus dolor felis, sit amet bibendum lectus molestie eu. Sed id odio ligula. Maecenas interdum lorem vel varius malesuada. Nunc euismod erat a molestie blandit. Interdum et malesuada fames ac ante ipsum primis in faucibus. Fusce sed dapibus augue, id commodo diam.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Lattice Free MMI

    -

    Duis faucibus consequat nisi, id dapibus velit tristique vitae. Donec cursus dolor nulla, eget iaculis libero elementum in. Curabitur porttitor suscipit velit, at imperdiet orci viverra id. Praesent vitae nisi convallis, bibendum purus sagittis, mattis ipsum. Fusce pulvinar mi sit amet lorem cursus ultricies vel sit amet velit. Quisque id aliquet purus. Integer vitae lectus ac lorem porttitor lobortis sed quis lorem. Quisque posuere non libero eu consequat. Curabitur porttitor lacus sed dolor dapibus posuere. Mauris luctus dolor felis, sit amet bibendum lectus molestie eu. Sed id odio ligula. Maecenas interdum lorem vel varius malesuada. Nunc euismod erat a molestie blandit. Interdum et malesuada fames ac ante ipsum primis in faucibus. Fusce sed dapibus augue, id commodo diam.

    - Open in Google Colab -
    -
    -
    -
    - - -
    -
    -
    -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/tutorial_nn.html b/tutorial_nn.html index 31a80bf..891ecf2 100644 --- a/tutorial_nn.html +++ b/tutorial_nn.html @@ -4,11 +4,12 @@ - - - - - SpeechBrain + + + + SpeechBrain Tutorials @@ -26,35 +27,28 @@
    @@ -62,153 +56,71 @@
    - +
    - +
    -
    -

    Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

    -
    -
    -
    -
    - - -
    - -
    -
    -
    -

    Complex and Quaternion Neural networks

    -

    This tutorial demonstrates how to use the SpeechBrain implementation of complex-valued and quaternion-valued neural networks - for speech technologies. It covers the basics of highdimensional representations and the associated neural layers : - Linear, Convolution, Recurrent and Normalisation.

    - Open in Google Colab -
    -
    -
    -
    - -
    - -
    -
    -
    -

    Recurrent Neural Networks and SpeechBrain

    -

    Recurrent Neural Networks (RNNs) offer a natural way to process sequences. - This tutorial demonstrates how to use the SpeechBrain implementations of RNNs including LSTMs, GRU, RNN and LiGRU a specific recurrent cell designed - for speech-related tasks. RNNs are at the core of many sequence to sequence models.

    - Open in Google Colab -
    -
    -
    -
    - - -
    -
    -
    diff --git a/tutorial_processing.html b/tutorial_processing.html index 4c7f5c8..891ecf2 100644 --- a/tutorial_processing.html +++ b/tutorial_processing.html @@ -4,11 +4,12 @@ - - - - - SpeechBrain + + + + SpeechBrain Tutorials @@ -26,35 +27,28 @@
    @@ -62,284 +56,71 @@
    - +
    - +
    -
    -

    Join our official Discourse to discuss with SpeechBrain users coming from all around the world!

    -
    -
    -
    -
    - -
    - -
    -
    -
    -

    Speech Augmentation

    -

    A popular saying in machine learning is "there is no better data than more data". However, collecting new data can be expensive - and we must cleverly use the available dataset. One popular technique is called speech augmentation. The idea is to artificially - corrupt the original speech signals to give the network the "illusion" that we are processing a new signal. This acts as a powerful regularizer, - that normally helps neural networks improving generalization and thus achieve better performance on test data.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Fourier Transform and Spectrograms

    -

    In speech and audio processing, the signal in the time-domain is often transformed into another domain. - Ok, but why do we need to transform an audio signal? Some speech characteristics/patterns of the signal (e.g, pitch, formats) - might not be very evident when looking at the audio in the time-domain. With properly designed transformations, - it might be easier to extract the needed information from the signal itself. The most popular transformation is the - Fourier Transform, which turns the time-domain signal into an equivalent representation in the frequency domain. - In the following sections, we will describe the Fourier transforms along with other related transformations such as - Short-Term Fourier Transform (STFT) and spectrograms.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Speech Features (MFCC, FBANK)

    -

    Speech is a very high-dimensional signal. For instance, when the sampling frequency is 16 kHz, - we have 16000 samples for each second. Working with such very high dimensional data can be critical from a machine learning perspective. - The goal of feature extraction is to find more compact ways to represent speech.

    - Open in Google Colab -
    -
    -
    -
    -
    - -
    -
    -
    -

    Environmental corruption

    -

    In realistic speech processing applications, the signal recorded by the microphone is corrupted by noise and reverberation. - This is particularly harmful in distant-talking (far-field) scenarios, where the speaker and the reference microphone are distant - (think about popular devices such as Google Home, Amazon Echo, Kinect, and similar devices).

    - Open in Google Colab -
    -
    -
    -
    - -
    - -
    -
    -
    -

    Multi-microphone Beamforming

    -

    Using a microphone array can be very handy to improve the signal quality - (e.g. reduce reverberation and noise) prior to performing speech recognition tasks. - Microphone arrays can also estimate the direction of arrival of a sound source, and this information can later - be used to "listen" in the direction of the source of interest.

    - Open in Google Colab -
    -
    -
    -
    - -
    -
    -
    diff --git a/tutorial_separation.html b/tutorial_separation.html new file mode 100644 index 0000000..891ecf2 --- /dev/null +++ b/tutorial_separation.html @@ -0,0 +1,147 @@ + + + + + + + + + + SpeechBrain Tutorials + + + + + + + + + + + + + + + +
    + +
    + + + +
    + +
    + + + +
    + +
    + + + +
    +
    +
    +
    + + + + + + + + + + + + + + + + + + + + + + + + + + +