Skip to content

Watermarks generative audio AI by poisoning the training datasets

License

Notifications You must be signed in to change notification settings

ObsisMc/watermark4audiocraft_data_poisoning

Repository files navigation

Watermark for AudioCraft by Dataset Poisoning

This work aims to make generative audio AIs like AudioCraft output watmarked audio by poisoning their training datasets, which can protect copyrights of musicians who publish their works online.

There are three main components:

The main idea is to use watermarked datasets to fine tune the generative AI, then fine tune the detector using output of the fine-tuned generative AI, in this way, we can know whether a piece of audio is generated by a generative AI that is trained on watermarked works. It contains 2 stages to fine tune.

The workflow is

$$ \text{1st-stage dataset prep} \rightarrow \text{fine tune generative AI} \rightarrow \text{2nd-stage dataset prep} \rightarrow \text{fine tune watermark model} $$

Install

This work is based on AudioCraft, especially MusicGen, please install its dependencies first.

To install watermark generator/detector

pip install audioseal
pip install wavmark

Dataset (1st stage)

  1. Download the dataset

    We use musiccaps released by MusicLM. However, it only contains metadata of the real audio, to get the real data, run

    cd your_path_of_the_project
    python dataset/downloader.py  # download the dataset into ./dataset/musiccaps

    Pay attention

    • The downloader code is from download-musiccaps-dataset, and I have fixed a bug in my code according to this issue
    • It may take a while to download and it may fail to download some audio due to the network.
  2. Dataset preprocessing

    Pay attention: all audio we use is mono

    The dataset preparation follows the guide. In brief, you need to prepare your dataset like the following directory structure

    - root_of_this_project
        - config/dset/audio
            - your_dataset.yaml  # the config of your dataset
        - dataset
            - your_dataset/  # the real data, each audio has one .json and one .wav
        - egs
            - your_dataset/data.jsonl  # metadata of your dataset

    You can just run the following codes to prepare the downloaded musiccaps dataset

    # extract mono audio from musiccaps into ./dataset/musiccaps_mono_10s
    python dataset/data_processor.py --action get_mono
    
    # build dataset without watermark (named 'musiccaps_mono_10s_nonwm')
    python dataset/data_processor.py --action build_mono --model none
    
    # build dataset with watermark using audioseal model (named 'musiccaps_mono_10s_audioseal')
    python dataset/data_processor.py --action build_mono --model audioseal
    
    # optional: build dataset with watermark using wavmark model (named 'musiccaps_mono_10s_wavmark')
    python dataset/data_processor.py --action build_mono --model wavmark

Fine tuning generative AI

According to MusicGen's fine tuning guide, run

dora run solver=musicgen/musicgen_base_32khz model/lm/model_scale=small continue_from=//pretrained/facebook/musicgen-small conditioner=text2music dset=audio/musiccaps_mono_10s_nonwm

where dset is the dataset prepared before (here I use the dataset without watermark).

To change the config of training process, you can see config/solver/musicgen/musicgen_base_32khz.yaml which is corresponding to the solver arg in the dora run ....

Export and test

  • Export checkpoints

    You can find SIG (after instantiating solver for XP) at the beginning of training, then export the best checkpoint by running

    python export_ft_models.py --sig your_SIG --name output_dir_name

    Then, the checkpoint will be export to checkpoints/{output_dir_name}_{SIG}

  • Test Use export_import_test.ipynb to display the output audio of your fine-tuned model

Dataset (2nd stage)

To prepare the stage 2 dataset, run

python dataset/stage2_data_prepare.py --pos_ckpt checkpoints/your_ckpt_path1 --neg_ckpt checkpoints/your_ckpt_path2 --set all
  • --pos_ckpt: checkpoint fine tuned on watermarked dataset
  • --neg_ckpt: checkpoint fine tuned on dataset without watermark
  • --set: generate training set (train), testing set (test) or both (all)

The train_prompts and test_prompts variables in stage2_data_prepare.py are prompts to generate audio and you can customize them.

Then, the stage 2 dataset will be generated in dataset2/ directory.

Fine tune watermark model

To fine tune the detector of audioseal, please see fine_tune_audioseal_detector.

Or the official training instruction may be helpful.

About

Watermarks generative audio AI by poisoning the training datasets

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published