Skip to content

Counterfactual Explanation for Recommender Systems - Evaluation

License

Notifications You must be signed in to change notification settings

dbis-uibk/CE4RS-Eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CE4RS-Eval

Counterfactual Explanation for Recommender Systems - Evaluation

In this paper, we critically examine the evaluation of counterfactual explainers through consistency and explanation sparsity as key principles of effective explanation. Through extensive experiments, we assess how incorporating Top-k recommendations impacts the consistency of existing evaluation metrics; and analyze the impact of explanation size on explainer's performance, highlighting its importance as a key determinant of explanation quality.

License: MIT

Repository

This repository contains code of the paper "A Closer Look at Counterfactual Explanation Metrics for Recommender Systems" paper. We have evaluated our claim on three publicly available benchmarks, MovieLens1M, a subset of Yahoo!Music dataset and a subset of Pinterest dataset, using two different recommenders, Matric Factorization (MF) and Variational Auto Encoder (VAE).

Folders

  • Experiments Results: contains all the results of recommenders we used for the tables and figures in the paper and other configurations discussed in paper.
  • code: contains several code files:
    • data_processing - code related to the preprocessing step for preparing data to run with our models.
    • recommenders_architecture - specifies the architecture of the recommenders that were used in the paper(MF, VAE).
    • recommenders_training - contains code related to VAE and MLP recommenders training.
    • LXR_training - contains code for training LXR model for explaining a specified recommender(This is the only recommender that needs training).
    • metrics - contains code related to model evaluation based on baseline methods approach.
    • metricsTopK.py - code for evaluation of of explainers based on K-th item of recommender list to address consistency
    • metricsXpSize.py - code for evaluation of methods on different size values (Explanation Sparsity Metric).
    • help_functions - includes the framework's functions that are being used in all codes.
  • checkpoints: It is the designated location for saving and loading the trained model's checkpoints.

Requirements

  • python 3.10

  • Pytorch 1.13

  • wandb 0.16.3 (the package we used for monitoring the train process)

  • Installation

Main libraries:

  • PyTorch: as the main ML framework
  • Comet.ml: tracking code, logging experiments
  • OmegaConf: for managing configuration files

First create a virtual env for the project.

python3 -m venv .venv
source .venv/bin/activate

Then install the latest version of PyTorch from the official site. Finally, run the following:

pip install -r requirements.txt

Usage

To use this code, follow these steps:

  • Create data to work with by running the data_processing code.
  • On every code, please specify the "data_name" variable to be 'ML1M'/'Yahoo'/'Pinterest', and the "recommender_name" variable to be 'MLP'/'VAE' or pass it through arguments of "recommender" and "data"

Reproducing the Results:

  • After running the preprocessing step, simply run the recommenders_training.py and specify the "data_name" variable to be 'ML1M'/'Yahoo'/'Pinterest', and the "recommender_name" variable to be 'MLP'/'VAE'.
  • From the output checkpoints check which recommenders you want to pick for explanation. Then set the file name of the checkpoint in LXR_training.py or pass it as a argument by --directory and run to train the explainers.
  • Then to get other explainers and evaluate LXR evaluation, run the metrics.py file. This will print all the numbers you want. We have all these outputs in "Experiments Results" folder.

Results

Top-K recommenders on metric consistency

Comparison of CE methods based on POS@5 (lower value is the better) across 4 performance levels of the VAE recommender on ML-1M dataset. The figure shows the impact of going beyond Top-1 (a) and considering Top-k (b-d) recommendations on improving consistency when evaluating CE models. To facilitate clearer comparisons, the values are normalized using Min-max normalization, and shading is used to represent the variance in the results. RecLengthFig

Explanation Sparsity Metric

Performance of CE methods across three datasets based on explanation sparsity metric using VAE recommender. The evaluation is conducted over eight explanation sizes, providing a comparative analysis of the methods. To facilitate clearer comparisons, the values are normalized using Min-max normalization. The results highlight dataset-specific performance variations, reflecting the effectiveness of each CE method on specific sparsity levels. XpSizeFig

Metric Consistency on MF Recommender

TopKMFRec

POS consistency on Pinterest dataset

TopKpinterestVAE

Consistency Evaluation effects of Top1 to Top5

TopKExcel

Evaluation based on ONLY Top-1

MLP_ML1M_table

Acknowledgements

Thanks to [LXR] for making their code public.

Citation

If you find the code helpful, please cite this work:


About

Counterfactual Explanation for Recommender Systems - Evaluation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages