-
Notifications
You must be signed in to change notification settings - Fork 90
Installation
DeepSqueak 1.0 was designed and tested with MATLAB 2017b.
To run DeepSqueak, navigate to the main DeepSqueak folder in MATLAB, and type "DeepSqeak" into the command line.
DeepSqueak will add itself to the MATLAB path after running.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
1. Call Statistics
- See Export to Excel
2. Extracted Contour
- Call contour, slope line.
3. Spectral Gradient of spectrogram
- Looks neat.
4. Tonality and Soundwave
-
Tonality vs time, overlaid on sound wave.
-
Yellow regions are above the tonality threshold.
5. Position in File
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Before vocalizations can be detected, an audio folder, neural network folder, and output folder must be selected.
After detecting calls, we recommend using a post hoc denoising network to remove false positives.
To process a single file:
-
Select the desired audio file in the "Audio Files" drop-down menu
-
Select an appropriate neural network in the "Neural Networks" drop-down menu
-
Click "Detect Calls"
-
Enter detection settings as described below, and click "OK"
To process files in a batch, or to process a single file with multiple networks:
-
Click "Multi Detect"
-
A list of the audio files in the current folder will appear. Select the desired audio files. Multiple files may be selected by holding the Ctrl key while clicking. Click "OK" to proceed.
-
After selecting the audio file(s), a box will appear, listing available neural networks. Select a maximum of two, and click "OK".
-
Enter detection settings as described below, and click "OK".
Detection Settings:
-
Total Analysis Length
-
Length, in seconds, of the audio file to process.
-
Set to 0 to process the entire file
-
If analysis length is greater than the file duration, a warning will be displayed in the command line, and the entire file will be processed
-
-
Analysis Chunk Length
-
Length of the audio sections to process at a time, in seconds
-
Files are processed in short chunks. The maximum length of each chunk is dependent on available GPU memory, frequency cutoffs, and the neural network
-
We found that a GPU with two gigabytes of memory performs well with four second chunks for short rat and mouse calls, and fifteen second chunks for long rat calls
-
If the GPU runs out of memory, a warning message will be displayed in the command line
-
-
Overlap
-
Amount of overlap between audio chunks, in seconds
-
Value should be about the length of a call
-
-
Frequency Cut Off High
- Regions of the spectrogram above this value are ignored
-
Frequency Cut Off Low
- Regions of the spectrogram below this value are ignored
-
Score Threshold
- Detected items with a score (likelihood of being a hit) below this value are automatically removed
-
Power Threshold
- Detected items with an amplitude below this value are automatically removed
-
Append Date to File Name
- If value is "1", the detection time will be appended to the end of the file name
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
To load audio, select the folder where the audio is stored by selecting "File > Select Audio Folder".
If the folder is successfully loaded, the audio files will appear in the "Audio Files" drop down menu.
DeepSqueak is capable of reading WAV (*.wav), .FLAC (*.flac), and Ultravox (*.UVD) audio files.
DeepSqueak was tested with a sampling frequency of 250 kHz; however, the spectrograms are created using fft windows of constant duration, rather than constant sample numbers, so other sample rates are accepted.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Detection files are saved in a folder specified in "File > Select Detection Folder".
To convert detection files to other formats, or export call statistics, see Import & Export
Detected calls are saved as a MATLAB structure with the following fields:
-
Rate:
- Audio sample rate
-
Box
-
Position of the call in the audio file
-
1x4 matrix, [Begin Time (s), Minimum Frequency (kHz), Duration (s), Frequency Range (kHz)]
-
-
RelBox
-
Position of the call in the Audio field
-
1x4 matrix, [Begin Time (s), Minimum Frequency (kHz), Duration (s), Frequency Range (kHz)]
-
-
Score
- Neural network score
-
Audio
- Audio containing the call
-
Accept
-
Status of call
-
1 is accepted, 0 is rejected
-
-
Type
- Call category
-
Power
- Amplitude of call
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
You probably won't need to do this.
To change the network architecture, edit "TrainSqueakDetector.m"
To create an image database for training a faster-RCNN detector:
-
Select "Tools > Network Training > Create Training Data"
-
In the dialog box, select all files to create images from.
-
Enter spectrogram settings.
-
FTT windows length, overlap, and NFFT are specified in seconds.
-
Amplitude cutoff: all values above this are set to one.
-
Bout length: calls within this distance are placed into a single image.
- If value is not equal to zero, only a single file can be processed at a time.
-
To train a faster-RCNN detector:
-
Select "Tools > Network Training > Train Network"
-
In the dialog box, select all training tables from which to train from (saved in "DeepSqueak\Training\").
-
Decide whether or not to use a pre-trained network as the starting point.
-
Training will take hours. When finished a save dialog will appear.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
After Calls have been detected, the user may manually or automatically classify calls and remove non-calls.
Loading Call Files
-
Select a folder with detection files from "File > Select Detection Folder"
-
Select a file in the "Detected Call Files" drop down menu
-
Click "Load Calls"
Navigation
To view the next call in the file, use the left and right arrow keys, "q" and "e" keys, and/or the "Previous Call" and "Next Call" buttons.
Drag the scroll bar above the spectrogram to jump to a position in the file. After clicking on the scroll bar, the arrow keys will also move the slider (to prevent this, click somewhere else on the main window).
Redraw Boxes
To redraw the box, press "d" or click the "Redraw" button, and drag the mouse to create a box over the desired region.
Call Classification
Accept or reject calls with "a" and "r", or the "Accept Call" and "Reject Call" buttons.
Define call categories with "Tools > Call Classification > Add Custom Labels".
Use keys 1-9 to apply a category to a call.
Play Calls
Play the current call through the default sound device by pressing "p", or the "Play Call" button.
Playback rate may be changed under "Tools > Change Playback Rate".
Change Score Threshold:
-
Load a call file.
-
Select "Tools > Automatic Review > Change Score Threshold"
-
Enter a score threshold, and click "OK". All calls below this threshold will be rejected.
Change Power Threshold:
-
Load a call file.
-
Select "Tools > Automatic Review > Change Score Threshold"
-
Enter a score threshold, and click "OK". All calls below this threshold will be rejected.
Remove Rejected Calls
-
Load a call file.
-
Select "Tools > Automatic Review > Remove Rejected Calls"
-
All rejected calls will be removed from the call file.
Post Hoc Denoising / False Positive Removal
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Keyboard Shortcuts:
Action: | Key: |
Play Call (rate can be changed in Tools > Change Playback Rate) | p |
View Next Call | right arrow, e |
View Previous Call | left arrow, q |
Accept Call | a |
Reject Call | r |
Redraw Box | d |
Classify Call (change categories in Tool > Call Classification > Add Custom Labels) | 1-9 |
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Call statistics, as well as unsupervised clustering, are calculated on spectrotemporal contours.
The contour is extracted by taking the maximum intensity at each time point, where both tonality and amplitude exceed a set threshold.
The tonality and amplitude thresholds may be changed in "Tools > Change Contour Threshold".
Higher values of tonality and amplitude will result in more conservative contours.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Although the call detection network will reject many non-vocal noise events, we have tuned to value sensitivity over precision.
False positives may be automatically identified and rejected using a post hoc neural network. The network must be located in "DeepSqueak\Denoising Networks\CleaningNet.mat".
To use the post hoc denoiser:
-
Select "Tools > Automatic Review > Post Hoc Denoising"
-
In the list box, select the detection files to denoise. All noise events found will be classified as "Noise" and rejected.
To train a post hoc denoiser:
-
Select "Tools > Network Training > Train Post Hoc Denoiser"
-
In the list box, select the detection files to to use for training.
-
Calls labeled as "Noise" are used as negative training sample.
-
Accepted calls not labeled as "Noise" are used as positive training samples.
-
-
When training is finished, the new network will be automatically saved as "DeepSqueak\Denoising Networks\CleaningNet.mat".
- Training will overwrite the older network. It is wise to create a backup of the old network.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
In addition to manually classifying calls, DeepSqueak includes two automated methods.
Unsupervised clustering applies featured-based machine learning with k-means to cluster calls, by minimizing the variance between a call's features and the nearest prototype cluster.
Supervised classification uses a convolution neural network to classify calls based on the spectrogram.
We've found that creating clustering with unsupervised methods, and using the cleanest clusters to train a supervised classification network, resulted in fast and accurate clustering.
Clusters may be viewed and renamed with "Tools > Call Classification > View Clusters"
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
The unsupervised clustering function uses k-means on perceptually relevant dimensions of the extracted contour, to place calls into a predefined number of clusters.
Each call is segmented into six partitions. The k-means algorithm operates on the slope and frequency of each partition, as well as the sinuosity of the first and second half of the call, and the call duration.
To perform unsupervised clustering using k-means:
-
Click "Tools > Call Classification > Unsupervised Clustering"
-
Select the detection files to cluster OR select the saved contours.
-
After the detection files are processed, you may save the extracted contours for faster loading.
-
Choose the clustering method. ARTwarp is still experimental, so k-means is currently recommended.
-
Enter the weights (relative importance) of each dimension.
-
When asked whether to use an existing model, click "No".
-
Enter the number of call categories. Our work-flow involved producing more clusters than desired, and training a supervised neural network on the best clusters.
-
Once clustering finishes, you will be prompted to save the model. This is optional.
-
A new interface will appear, showing the clusters. This interface can also be found under "Tools > Call Classification > View Clusters"
-
Name the clusters by entering a name in the text box. Clusters with the same name will be merged upon saving.
-
View different clusters with the "Next" and "Back" buttons.
-
View more calls within a cluster with the "Next Page" and "Previous Page" buttons.
-
Reject calls by clicking on them. Calls highlighted in red will be rejected upon saving.
-
Update the call files by clicking "Save", or redo the clustering with "Redo"
-
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Calls may be classified with a supervised neural network. This network operates on the spectrogram, rather than the contour.
To use a supervised classifier:
-
Select "Tools > Call Classification > Supervised Classification"
-
In the list box, select the detection files to classify, and click "OK".
To train a supervised classifier:
-
Select "Tools > Network Training > Train Supervised Classifier"
-
Select the detection files to to use for training.
- Rejected calls and "Noise" labeled calls will be ignored.
-
When training is finished, the new network will be automatically saved as "DeepSqueak\Denoising Networks\CleaningNet.mat".
- Training will overwrite the older network. It is wise to create a backup of the old network.
-
Specify the spectrogram frequency range, and click "OK".
-
Save the network when training finishes.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
After classifying calls, you may visualize the syntax graphs, and save the transition probability matrix.
-
Click "Tools > Call Classification > Syntax Analysis"
-
In the dialog box, select the files you wish to use.
-
A dialog box will appear. Enter the maximum call separation, in seconds, to define a bout, and a threshold for excluding uncommon call categories.
-
After the files are loaded, a transition matrix, and syntax graph will appear.
-
A dialog box will appear, giving the option to save the transition matrix and syllable counts as an Excel table.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Other Tools >>
Sometimes, such as when creating ground truth tables, or using multiple networks, it is useful to merge detection files.
To merge files:
-
Select "Tools > Merge Detection Files"
-
Select all files to merge, with Ctrl-Click.
-
Select the corresponding audio file.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Other Tools >>
Particularly for rat long 22s, the detection algorithm may place multiple calls into a single box.
To separate these calls, we apply k-means to the tonality, and create new boxes from unbroken regions of high tonality.
To separate calls:
-
Select "Tools > Separate Long 22s"
-
In the dialog box, select the detection file.
-
In the dialog box, select the correspond audio file.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
DeepSqueak can export call annotations and statistics to Excel files for further analysis. See Export to Excel.
DeepSqueak can also export spectrograms and audio.
Detected calls may be viewed in the context of an audio file in Raven Lite, by exporting a detection file as a Raven log.
DeepSqueak is capable of importing file annotations from Raven, Ultravox, and MUPET. It is useful for comparing analysis packages and creating ground truth tables.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
To export call statistics to an Excel file:
-
Select a folder containing detection files with "File > Select Detection Folder"
-
Click "File > Import / Export > Export to Excel Log (Call Statistics)"
-
In the list box, select the files to be exported, and click "OK"
-
Specify whether or not to include rejected calls
-
Select the folder to place the Excel files
The following call statistics are included in the output file:
-
ID
- Call ID
-
Label
- Call category
-
Accepted
- 1 is accepted, 0 is rejected
-
Score
- Neural network score
-
Begin Time (s)
- Calculated from contour
-
End Time (s)
- Calculated from contour
-
Call Length (s)
- End Time - Begin Time
-
Principle Frequency (kHz)
- Median frequency of the contour
-
Low Freq (kHz)
- Lowest frequency of the contour
-
High Freq (kHz)
- Highest frequency of the contour
-
Delta Freq (kHz)
- High Freq - Low Freq
-
Frequency Standard Deviation (kHz)
- Standard Deviation of the contour
-
Slope (kHz/s)
- Slope of the contour
-
Sinuosity
- Length of the path between the first and last points on the contour, divided by the euclidean distance between the first and last points
-
Max Power
- Average power of the contour
-
Tonality
-
Values near one are more tonal; lower signal to noise ratio
-
One minus the geometric mean of the spectrogram, divided by the arithmetic mean
-
One minus spectral flatness, or tonality coefficient, or Wiener entropy
-
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved.
Copyright © 2018 by Russell Marx & Kevin Coffey. All Rights Reserved. https://doi.org/10.1038/s41386-018-0303-6