Added german_cleaners #1642

padmalcom · 2022-10-28T11:40:48Z

I added a german cleaner to avoid lowercasing texts.

BenoitWang · 2022-11-03T11:03:34Z

Hi @padmalcom thank you for the pr, could you please explain why here you try to avoid lowercasing for German? And in German are abbreviations used? Is this cleaner general for German or for certain datasets? (Excuse that I don't speak German.) Thank you.

padmalcom · 2022-11-03T11:13:05Z

Hi @BenoitWang, sure! In German nouns start with a capital letter, verbs, adverbs etc. don't. E.g. "Zahlen" (numbers) and "zahlen" (to pay) can have different meanings, so cases are important. Abbreviations are used, yes. Since the dataset I use, is generated from mp3s and text is extracted via speech-to-text models and fixed by a punctuation model, I'm pretty sure that it does not contain abbreviations. The cleaner will work for every dataset.

BenoitWang · 2022-11-03T17:14:22Z

Ok I see. Please finish the tests (some simple linter issues). You are also welcome to create a list of german abbreviations if you've got time.

BenoitWang

Seems good.

padmalcom · 2022-11-22T18:56:32Z

Sorry what ist the issue Here?

padmalcom added 2 commits October 28, 2022 13:31

Added german_cleaners

47ffae3

Removed whitespances.

1b39c96

mravanelli requested a review from BenoitWang November 2, 2022 15:57

BenoitWang reviewed Nov 12, 2022

View reviewed changes

removed whitespaces

3081e61

Update text_to_sequence.py

881446a

BenoitWang approved these changes Nov 22, 2022

View reviewed changes

BenoitWang merged commit 5063efb into speechbrain:develop Nov 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added german_cleaners #1642

Added german_cleaners #1642

Uh oh!

padmalcom commented Oct 28, 2022

Uh oh!

BenoitWang commented Nov 3, 2022

Uh oh!

padmalcom commented Nov 3, 2022

Uh oh!

BenoitWang commented Nov 3, 2022

Uh oh!

BenoitWang left a comment

Uh oh!

padmalcom commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Added german_cleaners #1642

Added german_cleaners #1642

Uh oh!

Conversation

padmalcom commented Oct 28, 2022

Uh oh!

BenoitWang commented Nov 3, 2022

Uh oh!

padmalcom commented Nov 3, 2022

Uh oh!

BenoitWang commented Nov 3, 2022

Uh oh!

BenoitWang left a comment

Choose a reason for hiding this comment

Uh oh!

padmalcom commented Nov 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants