Machine learning tools for waveform analysis.
Recommended to use conda installation manager. https://docs.conda.io/projects/conda/en/latest/user-guide/install/
First, install prerequisites:
conda install numpy scipy matplotlib gitpython yaml hdf5 h5py numba pytz
Then install pytorch version 1.6 or higher, see https://pytorch.org/get-started/locally/
On linux with GPU support, cuda version 10.2, this command is
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
With no GPU support, the command is
conda install pytorch torchvision torchaudio cpuonly -c pytorch
Then, install pytorch-lightning and optuna
conda install -c conda-forge optuna
Pytorch lightning requires version 1.0 or higher:
conda install pytorch-lightning -c conda-forge
To run the sparse convolutional network code, install spconv or SparseConvNet
install boost:
sudo apt-get install libboost-all-dev
make sure cmake >= 3.13.2 is installed then install spconv:
git clone https://github.com/traveller59/spconv --recursive
cd spconv
python setup.py bdist_wheel
cd ./dist
pip install <name of .whl file>
follow instructions here: https://github.com/facebookresearch/SparseConvNet
Using pip (example is for linux, see pytorch install guide for your system)
pip install numpy
pip install h5py
pip install gitpython
pip install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install optuna
pip install pytorch-lightning
python main.py <name of config.json file>
output is logged in
{model folder}/runs/<experiment name>/version_<n>
where {model folder} defaults to
./model/<model name>
and <n>
is the run number of the experiment starting from 0
Set dataset_config.paths to the directories you would like to retrieve hdf5 data from. Set the number of events to draw from this directory with n_train, n_validate, and n_test.
Metadata of events selected will be logged to
{model folder}/datasets/<dataset name>_<dataset type>_config.json
Where <dataset name>
is constructed from the directory names or can be specified via
dataset_config.name
. The <dataset type>
is one of train
, val
, or test
.
Constructed datasets will be saved on disk to
./data/<model_name>/<dataset name>.json
To set the training, test, or validation data sets to use a specific metadata file, use
dataset_config.train_config
, .test_config
, and .val_config
- point these to
the json metadata file.
Requirements for the config file are shown in config_requirements.json. This can be used as a template. A full example config file is found in config/examples
--overfit_batches 0.001 // overfits on a small percentage of the data. Useful for debugging a network
--profiler true // profiles the program, showing time spent in each section. output written in log dir found in <model folder>/runs/<experiment name>/version_<n> where n is the (n-1)th run of the experiment
--limit_test_batches n // if int, limits number of batches used for testing to n batches. If float < 1, uses that fraction of the test batches.
--limit_val_batches n
--limit_train_batches n
--log_gpu_memory all | min_max // logs the gpu usage, set to min_max for only max and min usage logging
--terminate_on_nan true // terminates when nan loss is returned
--auto_lr_find true // starts the training session with a learning rate finder algorithm, prints results / saves to log folder
--row_log_interval n // records tensorboard logs every n batches (default 1)
--log_save_interval n // only writes out tensorboard logs every n batches
see https://pytorch-lightning.readthedocs.io/en/latest/trainer.html#trainer-flags for a complete list of arguments
For performance tweaking on a GPU, set
--profiler=true
then slowly increase the num_workers (under dataset_config/dataloader_params) until you find optimal performance.
Once optimal data loading performance is found, then tune your learning rate. You can set --auto_lr_find=true to find an optimal learning rate for your learning scheduler.
Once a good learning rate is chosen, set up a hyperparameter optimization config and set
-oc <name of config file or path to config file>.
The search path for the optimize config is the same for model config files. Alternatively, you can add an optuna_config section in the config file.
Results are in
./studies/<experiment name>
Each trial's logs and model checkpoints are saved to the
./studies/<experiment name>/trial_<n>
folder.
You can prune unpromising trials by adding the -p
flag to the command.
You can choose your pruner by setting the pruner
value in your optuna config file to
one of the pruners listed here: https://optuna.readthedocs.io/en/stable/reference/pruners.html
You can pass parameters to the pruner with the pruner_params
config options.
You can choose your sampler by setting the sampler
value in your optuna config file to
one of the samplers listed here: https://optuna.readthedocs.io/en/stable/reference/samplers.html
You can pass parameters to the pruner with the pruner_params
config options.
The default sampler is TPESampler
and the default pruner is MedianPruner
In the console, run the command tensorboard --logdir <log path>
This will serve the log data to http://localhost:6006
Create your own modules by extending the LitPSD class. See https://pytorch-lightning.readthedocs.io/en/latest/child_modules.html for more information.