Skip to content

Improve sampling routines for Correlation matrices and R interface

Apostolos Chalkis edited this page Mar 15, 2024 · 1 revision

Overview

Sampling from correlation matrices is useful for numerous applications in statistics and bioinformatics. It is a fundamental problem in numerous Bayesian models. This coding project considers the case of sampling uniformly from the set of correlation matrices. The aim is to improve the relative computational subroutines in volesti that will allows faster sampling of correlation matrices. The contributor will also has to expose in the R interface of volesti its C++ routines that one could use to sample Correlation matrices through Rvolesti.

Related work

Sampling correlation matrices is a relatively difficult problem due to three constraints imposed on a rectangular matrix: positive definiteness, that is, a symmetric matrix with non-negative eigenvalues, fixed unit diagonal elements, and non-diagonal elements bounded in [-1,1]. volesti relies on the geometric representation of correlation matrices in [1] and the Markov Chain Monte Carlo methods implemented in volesti` for sampling from a multivariate truncated distribution.

Details of your coding project

The student will replace Eigen with Spectra routines to solve the Generalized Eigenvalue Problems (GEP) in each step of the random walk. She/he will follow the existing implementations in volesti that solve GEPs when sampling from the feasible region of a Semidefinite Program (SDP). She/he will also expose the C++ functions to the Rvolesti by implementing new R and Rcpp functions.

Difficulty: Easy

Size

Small (90 hours)

Skills

  • Required: C++, Probability theory, Basic applied math background
  • Preferred: Experience with statistical or other mathematical software is a plus

Expected impact

The project will be a very useful addition to package volesti. They will crucially contribute to the implementation of efficient Bayesian models to learn the covariance matrix and to fit a copula on given data.

[1] Efficient Bayesian inference of systemic risk interlinkages, V Arakelian, A Chalkis (2021).

Mentors

  • Apostolos Chalkis <tolis.chal at gmail.com> is a Research Engineer at Quantagonia GmbH. He is an expert in statistical software, computational geometry, and optimization, and has previous GSoC student experience (2018 & 2019) and mentoring experience with GeomScale (from 2020 to 2023).
  • Zafeirakis Zafeirakopoulos is an expert in implementing and benchmarking geometric and algebraic algorithms and has previous GSOC experience with the R-project (2018, 2019).

Students, please contact the first and the third mentor after completing at least one of the tests below.

Tests

Students, please do one or more of the following tests before contacting the mentors above.

  • Easy: Compile and run volesti.
  • Medium: Use existing C++ implementation in volesti to sample correlation matrices.

For tips and references contact the Mentors!

Solutions of tests

Students, please post a link to your test results here.

  • EXAMPLE STUDENT 1 NAME, LINK TO GITHUB PROFILE, LINK TO TEST RESULTS.