Skip to content

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

License

Notifications You must be signed in to change notification settings

zhangshengoo/uis-rnn

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UIS-RNN

Overview

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of segmenting and clustering sequential data by learning from examples.

This algorithm was originally proposed in the paper Fully Supervised Speaker Diarization.

gif

Disclaimer

This open source implementation is slightly different than the internal one which we used to produce the results in the paper, due to dependencies on some internal libraries.

We CANNOT share the data, code, or model for the speaker recognition system (d-vector embeddings) used in the paper, since the speaker recognition system heavily depends on Google's internal infrastructure and proprietary data.

This library is NOT an official Google product.

Dependencies

This library depends on:

  • python 3.5+
  • numpy 1.15.1
  • pytorch 0.4.0

Tutorial

Run the demo

To get started, simply run this command:

python3 demo.py --train_iteration=20000

This will train a UIS-RNN model using data/training_data.npz, then store the model on disk, perform inference on data/testing_data.npz, print the inference results, and save the approximate accuracy in a text file.

PS. The files under data/ are manually generated toy data, for demonstration purpose only. These data are very simple, so we are supposed to get 100% accuracy on the testing data.

Core APIs

All algorithms are implemented as the UISRNN class. First, construct a UISRNN object by:

model = UISRNN(args)

Next, train the model by calling the fit() function:

model.fit(train_sequence, train_cluster_id, args)

Once we are done with the training, we can run the trained model to perform inference on new sequences by calling the predict() function:

predicted_label = model.predict(test_sequence, args)

The definitions of the args are described in model/arguments.py.

Run the tests

You can also verify the correctness of this library by running:

sh run_tests.sh

If you fork this library and make local changes, be sure to use these tests as a sanity check. Besides, these tests are also great examples for learning the APIs.

Citations

Our paper is cited as:

@article{zhang2018fully,
  title={Fully Supervised Speaker Diarization},
  author={Zhang, Aonan and Wang, Quan and Zhu, Zhenyao and Paisley, John and Wang, Chong},
  journal={arXiv preprint arXiv:1810.04719},
  year={2018}
}

About

This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%