This repository provides the recipe for the paper Multilingual Speech Recognition for Turkic Languages.
You can download the best performing models below.
model |
---|
turkic_languages_model.zip |
all_languages_model.zip |
To convert your audio file to text, please make sure it follows a wav format with sample rate of 16k. Unzip the pre-trained model in the current directory, and install the necessary packages by running pip install -r requirements.txt
. To perform the evaluation please run:
python recognize.py -f <path_to_your_wav>
There are multiple datasets involved, including KSC, TSC, USC, and Common Voice version 10.0 for the following languages: Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Turkish, Tatar, Uzbek, and Uyghur. To train the ASR model, please download all of them and specify the paths in conf/lang.conf
.
Our code builds upon ESPnet, and requires prior installation of the framework for DNN training. Please follow the installation guide and put the TurkicASR folder inside espnet/egs2/
directory. Run the traning scripts with ./run.sh
@Article{info14020074,
AUTHOR = {Mussakhojayeva, Saida and Dauletbek, Kaisar and Yeshpanov, Rustem and Varol, Huseyin Atakan},
TITLE = {Multilingual Speech Recognition for Turkic Languages},
JOURNAL = {Information},
VOLUME = {14},
YEAR = {2023},
NUMBER = {2},
ARTICLE-NUMBER = {74},
URL = {https://www.mdpi.com/2078-2489/14/2/74},
ISSN = {2078-2489}
}