A package for training music emotion recognition models using social media comments. Measuring continous emotion labels (e.g. valence, arousal) for music is challenging, typically relying on human annotators to listen to musical samples and rate their percieved emotion. As emotion is a subjective quantity, a large set of annotations are needed per sample in order to make a statistically significant inference about the emotional qualities of a song. This makes music emotion annotation expensive and time-consuming. Our study seeks to explore a system for automatically estimating the average emotional response of a listener to a piece of music by using social media comments related to that piece of music. We provide a system for collecting social media datasets related to musical discourse and for training BERT-like models on this data.
You can find our paper here: https://link.springer.com/chapter/10.1007/978-3-031-44260-5_6
(full txt: https://aidan-b1409.github.io/files/music_emotion_sact.pdf)
To install this package
- Use a Python package management solution (e.g. mamba, conda to create an environment from the
environment.yml
file.conda env create --name mdp --file environment.yml
- Activate your new environment.
conda activate mdp
- Install the
music_discourse_prediction
package to your local environmentpip install .
[Opt]. Install the MongoDB server from here in order to use our data collection pipeline.
Our work consists of two main contributions, represented by the data_mining
module used for social media musical discourse data collection, and the bert_features
module, which is used for training and evaluating BERT-like pretrained large language models for the task of predicting music valence and arousal targets from only social media comments.
Our data collection approach depends on a MongoDB instance running on localhost
, on the default port 27017. The data_mining
module connects to this local database, and initializes an API connection to the specified social media service. From this, we form a search query to the social media API strictly including the song title and artist name, and return some subset of the top submissions. For Reddit this is all submissions returned by the search API, for Twitter it is the top 100 tweets, and for YouTube it is the top 50 videos. For each of these top-level submissions, we then pull all reply comments or tweets in response to that original post which explicitly mentions the song title and artist name. Each top-level contribution and reply is stored as a separate record in the posts
collection
Arguments:
[--dataset]
: The dataset from which to pull query songs. Options: deam, amg1608, deezer, pmemo
[--type]
: Which social media source to query from. Options: youtube, reddit, twitter
[--timestamp]
: The last known 'good' timestamp. If there was an error that resulted in a crash during your data pull, you can query the database instance and find the timestamp of the last successful pull. The bot will resume at the song it left off at.
[--config]
The config file for the scraping bot. For the Reddit bot, this is in the form of a praw.ini
fle.
In the datasets
folder, we provide four datasets of musical samples annotated for valence and arousal: AMG1608, PmEmo, DEAM and Deezer2018. Our data scraping workflow depends on these datasets being loaded into your MongoDB instance. The mongo_songs
command will allow you to quickly load and insert these datasets into your database instance. This script will read the CSV and insert each song, with it's assosciated valence and arousal label, into the songs
collection.
Arguments:
[--input]
: The path of the csv containing the annotated samples.
Note: For the Deezer2018 dataset, the authors define explicit train, test, and validation splits. We retain these splits in separate csv files. However, our data loading script looks for all three files: deezer_train.csv
, deezer_test.csv
, and deezer_validation.csv
if any one of them is supplied from --input
So, if you run mongo_songs --input datasets/DEEZER_test.csv
, it will load all of the songs from all three Deezer dataset files. So, running the load command for all three files is unnecesscary.
Each social media service requires an API key in order to request data. You can request API keys here for Reddit, YouTube, and Twitter. Our data pipeline expects Twitter credentials to be provided in a .env
file, with the fields TWITTER_BEARER_TOKEN
, TWITTER_ACCESS_TOKEN
, TWITTER_ACCESS_TOKEN_SECRET
, TWITTER_API_KEY
, and TWITTER_API_KEY_SECRET
. Reddit configuration is provided by the keys REDDIT_CLIENT_ID
and REDDIT_CLIENT_SECRET
. YouTube authentication must be provided by a separate .json
file, which you can generate in the Google Cloud web console (helpful instructions here: https://stackoverflow.com/questions/43367664/get-client-id-and-client-secret-of-the-file-client-secrets-json-of-youtube-api) First run will trigger an oauth workflow, which will pair your app with the YouTube data API and save your session token in a new file titled yt_token.json
in the root directory.
To train a new model, use the bert_features
command.
Arguments:
[--dataset]
: The name of the dataset which the songs come from. Required. Options are AMG1608, DEAM, PmEmo, or Deezer.
[--source]
: List of social media sources from which to use comments from. Required. Options are [Youtube, Reddit, Twitter].
[--epochs]
: Number of epochs to fine-tune for. Optional. Default is 2.
[--batch_size]
: Batch size per GPU. Required. Default is 16.
[--length]
: Filter command. Drop all comments below a certain number of characters. Optional. Default is 32.
[--score]
: Filter command. Drop all comments below a certain number of likes. Optional. Default is 3.
[--model_name]
: HuggingFace model name of a BERT-like model. Default: distilbert-base-cased
.
[--input_dir]
: Path to a valid .csv
which contains a dataset of music discourse comments. Optional. Used in place of a MongoDB instance running on localhost
.
We provide two options for attaching a dataset to our model training API. The model API will, by default, search for MongoDB instance running on localhost
. If an input CSV is provided, the model will use that training dataset instead, and an active MongoDB server will not be required to run the model.
Without any arguments, bert_features
will default to pulling a social media dataset from a locally hosted MongoDB instance. The program attempt to connect to MongoDB over its default port (27017). We provide a MongoDB image containing our dataset [here]. This database contains two collections, songs
and posts
. The songs
collection contains ~20,000 records of songs labeled for valence and arousal from the AMG1608, PmEmo, DEAM and Deezer2018 datasets. The posts
collection contains 20,000,000+ social media comments from Reddit, YouTube, and Twitter which mention any of the songs from the four music emotion recognition datasets used in our study.
You can select subsets of our dataset, slicing by MER dataset using the --dataset
flag, and by social media source (Twitter, Youtube, Reddit) with the --source
flag.
bert_features
also accepts datasets in a .csv
format. You can generate a .csv
dataset from the above MongoDB instance using mongoexport By using a .csv input, you can save preprocessing time, as no query to the database server will be needed in order to retrieve the model training data. This can be useful in deployments where it is difficult to run a database server concurrently with the model training (e.g. HPC clusters). You can retrieve a subset of the dataset by filtering for songs from a specific MER dataset (AMG1608, DEAM, Deezer, PmEmo), or posts from a specific social media source (Twitter, YouTube, Reddit). Once you have this CSV, you can provide it to the model with --input
. If no --input
command is provided, the model will assume a database is in use and attempt to connect to it over localhost.
If you would like to use our code or dataset, please cite our publication here:
@inbook{
Beery_Donnelly_2024,
title={Learning Affective Responses to Music from Social Media Discourse},
ISBN={978-3-031-44260-5},
url={https://doi.org/10.1007/978-3-031-44260-5_6},
DOI={10.1007/978-3-031-44260-5_6},
booktitle={Practical Solutions for Diverse Real-World NLP Applications},
publisher={Springer International Publishing},
author={Beery, Aidan and Donnelly, Patrick J.},
year={2024},
pages={93–119}
}
Developed by: Aidan Beery - mail: beerya@oregonstate.edu
Advised by: Dr. Patrick J. Donnelly - mail: donnellp@oregonstate.edu
Website: http://www.soundbendor.org/