TurkicASR

This repository provides the recipe for the paper Multilingual Speech Recognition for Turkic Languages.

Pre-trained models

You can download the best performing models below.

model
turkic_languages_model.zip
all_languages_model.zip

Inference

To convert your audio file to text, please make sure it follows a wav format with sample rate of 16k. Unzip the pre-trained model in the current directory, and install the necessary packages by running pip install -r requirements.txt. To perform the evaluation please run:

python recognize.py -f <path_to_your_wav>

Datasets

There are multiple datasets involved, including KSC, TSC, USC, and Common Voice version 10.0 for the following languages: Azerbaijani, Bashkir, Chuvash, Kazakh, Kyrgyz, Sakha, Turkish, Tatar, Uzbek, and Uyghur. To train the ASR model, please download all of them and specify the paths in conf/lang.conf.

Training

Our code builds upon ESPnet, and requires prior installation of the framework for DNN training. Please follow the installation guide and put the TurkicASR folder inside espnet/egs2/ directory. Run the traning scripts with ./run.sh

Citation

@Article{info14020074,
AUTHOR = {Mussakhojayeva, Saida and Dauletbek, Kaisar and Yeshpanov, Rustem and Varol, Huseyin Atakan},
TITLE = {Multilingual Speech Recognition for Turkic Languages},
JOURNAL = {Information},
VOLUME = {14},
YEAR = {2023},
NUMBER = {2},
ARTICLE-NUMBER = {74},
URL = {https://www.mdpi.com/2078-2489/14/2/74},
ISSN = {2078-2489}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
training		training
.gitattributes		.gitattributes
LICENSE.md		LICENSE.md
README.md		README.md
recognize.py		recognize.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TurkicASR

Pre-trained models

Inference

Datasets

Training

Citation

About

Releases

Packages

Languages

License

IS2AI/TurkicASR

Folders and files

Latest commit

History

Repository files navigation

TurkicASR

Pre-trained models

Inference

Datasets

Training

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages