Sravan Jayanthi
The goal is to create a machine learning model that can generate near-authentic classical music. This repo ablates across several RNN & Attention based approaches to determine their performance a t this task. This model preprocesses the wave based input in the form of midi files and encodes it based on the duration and chord of the notes being played. Then, the music derived data is organized into batches to be used to train the model so that given a sequence of priming notes, it can generate a realistic sequence of chords that form into a melodic song. The newly created music is then decoded so that the encoding utilized in the model are translated back into their representative oscillating musical notes.
This project contains a model training script and a music prediction script along with assosciated encoding, decoding, and vocabulary implementations.
File/Directory | Purpose |
---|---|
model.py |
Train the RNN model |
predict.py |
Generate sequence of music |
vocab.py |
Parameters for encoding |
encode.py |
Codify the musical notes |
decode.py |
Translate back to notes |
transformer.py |
Transformer model implementation |
music/ |
Classical music input |
generated/ |
Synthesized music output |
training_checkpoints/ |
Trained model weights |
stats/ |
Sample translations |
In order to utilize the machine learning model, a repository of music should be identified from which the model will gather its training data from.
- Port the collection of music in the form of
.mid
or.midi
files into thegenerated/
folder - Select a sample to be used as a primer for the model to generate music and place in the
generated/sample
folder - Train the GRU/LSTM/Transformer model in
model.py
with the desired training parameters specifying the size and scope of the algorithm - Execute the script
model.py
with the requisite dependencies installed, this will generate the model weights which will be stored in thetraining_checkpoints/
folder. Command:python model.py -d <file_directory> -m <model:["gru", "lstm", "transformer"]>
- Tune the prediction iteration of the trained model in
predict.py
with the desired parameters - Execute the script
predict.py
which will sample the primer and synthesize a new song which will be written in thegenerated/
folder - Play your wonderful artistic piece and enjoy!
Use the given conda environment export in environment.yml
& follow tensorflow instructions for enabling GPU usage on your machine.
Notes about the dataset which can be found here:
- 1276 files, 300gb of compressed audio data
- 20.6 mil tokens
Architecture Parameters
- 312 Vocab size w/ Note toks, Duration toks, Tempo toks, Special toks
- 100 token sequence length
- 256-dim embedding size
Training Parameters
- 256 Batch Size
- 40 Epochs
- 0.0001 Learning rate
- [0.8, 0.1, 0.1]: Train, Validation, Test split
Code for predicting notes based off of previously played music using generative random sampling.
predictions = tf.squeeze(predictions, 0)
predictions = predictions / TEMPERATURE
predicted_id = tf.random.categorical(predictions, num_samples=1)[-1,0].numpy()
input_eval = tf.expand_dims([predicted_id], 0)
music_generated.append(predicted_id)
MIDI (Musical Instrument Digital Interface) representation is a standard protocol that enables electronic musical instruments, computers, and other equipment to communicate, control, and synchronize with each other. MIDI itself does not contain any sound, but rather it is a digital protocol that represents music performance data.
16 Channels for different instruments -> When a note is played, how long it is held, how hard it is played, and when it is released.
MIT