Replies: 5 comments 26 replies
-
My first thought was taking a look to Common Voice text corpus samples, but if it should be more in a smartspeaker use-case maybe it's worth a look checking Mycroft Skills. There are texts (including templates/placeholders) for many languages available. |
Beta Was this translation helpful? Give feedback.
-
Rhasspy's template language is a subset of JSGF, with a few extras to make life easier 🙂
I think the sentence templates and the word lists could be grouped by task/language with a directory structure like @fquirin A useful feature of the Rhasspy Template Language for your new language models: I have tools that will convert a set of sentence templates and word lists directly into n-gram counts for use with KenLM, etc. I emphasize directly because there is no need for a step where all possible text strings are generated, which could be millions. This lets you compactly describe a corpus, since you can do things like @thorstenMueller @fquirin One feature I'd like to talk about is the need to modify certain words in a sentence due to the language's grammar constraints. German's gender is a good example, where the determiner changes depending on the noun's gender (der/die/das). In English, I can write a template like I'm trying to imagine if this kind of thing could be generalized to help solve some of the date and time issues that @fquirin has mentioned in the past. |
Beta Was this translation helpful? Give feedback.
-
@fquirin @thorstenMueller Paulus from Home Assistant and I have started on a sentence database and file format. Would either of you be interested in helping out with German? |
Beta Was this translation helpful? Give feedback.
-
Hi @synesthesiam, I've had some time to work on the Rhasspy sentences.ini support (still in planning phase) and was wondering if the format supports multiple optionals like:
or maybe:
This is something I use a lot in my regular expressions module. |
Beta Was this translation helpful? Give feedback.
-
Hello fellow open-source voice assistant creators and enthusiasts 😃 ,
Open-source speech recognition has come a long way, but even with the latest "production-ready" systems the real-time transcription quality quickly degrades if your microphone isn't the best, if your environment is noisy, if you stand more than 2m away from the microphone (smart-speaker) or if you work in a speech domain that was not very prominent in the training data. Unfortunately the latter is often the case, even for something so obvious as "voice assistants". A simple example: "set a timer" becomes "set a time".
Since v2.7.0 the SEPIA client app and smart-services can set an active "task" parameter that can be used (among other things) to dynamically switch speech recognition models. This is very useful if your service is part of a domain that is typically very complicated to handle because of a very large vocabulary like navigation (thousands of street and city names) or music (thousands of artists, titles and genres).
For example in the default mode the client could use the "general" speech recognition model, optimized for voice assistant input (command & control etc.). Then if you say "play some music" it will switch to the "music" task, activate the optimized model and ask the user "what do you want to hear?". If the service is finished it switches back to the general model.
To make this work I'm planning to train a hand full of new language models for the SEPIA STT-Server and this will require to build up a larger database of sentences.
Since this list of sentences could be equally useful to train NLU models I thought about adding additional info to each sentence like intent and maybe even parameters (entities, actions, variables, ... whatever you want to call them ;-)).
The existing SEPIA sentences are very limited, since most of the NLU is rules-based and the format hasn't really aged well 🙈 , so I think it is best to start from scratch and develop the new system hand-in-hand with a new NLU module for the SEPIA pipeline (complementing the existing ones).
In this thread I'd like to discuss ideas about the file format, how to store, use and expand the data. Here are some ideas:
assistant, yes_or_no, smart_home, control, time, schedule, numbers, math, web_search, news_search, music, navigation, conversation, translation
I hope this data will become useful to more open-source projects, so questions, ideas, contributions are very welcome! 🙂
@synesthesiam I hope this will be interesting for Rhasspy as well. Maybe you already have a corpus to start from? 😁
@thorstenMueller It would be awesome to have you on board for this 😃. Maybe there is some data in the OpenVoice-Tech Wiki?
I'll start to add some examples and suggestions to the mentioned repository the next days.
Beta Was this translation helpful? Give feedback.
All reactions