Support all Types of Languages
Flux9665
released this
20 May 10:04
·
27 commits
to Multi_Language_Multi_Speaker
since this release
This release extends the toolkits functionality and provides new checkpoints.
New Features:
- support for all phonemes in the IPA standard through an extended lookup of articulatory features
- support for some suprasegmental markers in the IPA standard through parsing (tone, lengthening, primary stress)
- praat-parselmouth for greatly improved pitch extraction
- faster phonemizaton
- word boundaries are added, which are invisible to the aligner and the decoder, but can help the encoder in multilingual scenarios
- tonal languages added, tested and included into the pretraining (Chinese, Vietnamese)
- Scorer class to inspect data given a trained model and dataset cache (provided pretrained models can be used for this)
- intuitive controls for scaling durations and variance in pitch and energy
- divese bugfixes and speed increases
Note:
- This release breaks backwards compatibility. Make sure you are using the associated pretrained models. Old checkpoints and dataset caches become incompatible. Only HiFiGAN remains compatible.
- Work on upcoming releases is already in progress. Improved voice adaptation will be our next goal.
- To use the pretrained checkpoints, download them, create their corresponding directories and place them into your clone as follows (you have to rename the HiFiGAN and FastSpeech2 checkpoints once in place):
...
Models
└─ Aligner
└─ aligner.pt
└─ FastSpeech2_Meta
└─ best.pt
└─ HiFiGAN_combined
└─ best.pt
...