Simple Unsupervised Phoneme Segmentation

An unofficial implementation of the phoneme segmentation method given in Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks (INTERSPEECH 2021)

This method requires no additional training, and can be easily applied on various speech representations using the S3PRL toolkit.
It works by simultaneously minimizing frame-wise distance to nearest k-means cluster center and the number of phoneme-like segments in an utterance with dynamic programming. (see paper above for details)

Notice

Segmentation accuracy is heavily dependent on the parameter lambda, of which the optimal value varies greatly between choice of self-supervised representations. Lambda is set to 35 in default.
Values between 20~50 get about 60% F1 score for the 6th layer of HuBERT.

Also note that while the phoneme-level segments are each assigned to a cluster center, the same phoneme in different segments are often assigned to different cluster centers, which means that this method is less suitable for phoneme discovery.

Visualization with Praat

Given an audio file and its text transcript, we can use forced alignment to obtain supervised word/phoneme boundaries, to visualize our method's accuracy. Detailed steps are given in demo.ipynb.

Dependencies

The S3PRL toolkit
Pretrained K-means model (see FAIRSEQ GSLM)

TODO:

Add forced alignment and Praat visualization demo
Add F1 score calculation

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
wav		wav
README.md		README.md
demo.ipynb		demo.ipynb
segment.py		segment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Unsupervised Phoneme Segmentation

Notice

Visualization with Praat

Dependencies

TODO:

About

Releases

Packages

Languages

roger-tseng/self-supervised-vq-segmentation

Folders and files

Latest commit

History

Repository files navigation

Simple Unsupervised Phoneme Segmentation

Notice

Visualization with Praat

Dependencies

TODO:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages