Computing Model Comparison using PSIS-LOO
Overview
We are often interested in comparing how well different models account for the same data. In this tutorial, we will compare models based on their expected log pointwise predictive density (ELPD), which we will estimate using Pareto smoothed importance sampling leave-one-out cross-validation (PSIS-LOO).
First, we'll simulate data from a LBA
with fixed parameters. Then, we'll define three LBA models with different fixed values for the k
parameter. Finally, we'll use ParetoSmooth.jl to perform model comparison via PSIS-LOO.
Load Packages
Before proceeding, we will load the required packages.
using Random
using SequentialSamplingModels
using LinearAlgebra
using ParetoSmooth
using Turing
Data-Generating Model
The next step is to generate simulated data for comparing the models. Here, we'll use an LBA as the true data-generating model:
Random.seed!(5000)
dist = LBA(ν=[3.0, 2.0], A=0.8, k=0.3, τ=0.3)
data = map(_ -> rand(dist), 1:1000)
Specify Turing Models
The following code block defines the model along with its prior distributions using Turing.jl. We'll use this model with different fixed values for the k
parameter.
@model function model(data, k; min_rt=minimum(x -> x.rt, data))
# Priors
ν ~ MvNormal(fill(3.0, 2), I * 2)
A ~ truncated(Normal(0.8, 0.4), 0.0, Inf)
τ ~ Uniform(0.0, min_rt)
# Likelihood
for i in 1:length(data)
data[i] ~ LBA(; ν, A, k, τ)
end
end
Note that iterating over the data with a for loop necessary to produce correct PSIS-LOO estimates.
Estimate the Parameters
Now we'll estimate the parameters using three different fixed k
values:
chain_lba1 = sample(model(data, 2.0), NUTS(), 1000)
chain_lba2 = sample(model(data, 0.3), NUTS(), 1000)
chain_lba3 = sample(model(data, 1.0), NUTS(), 1000)
Compute PSIS-LOO
Next we will use the psis_loo
function to compute the PSIS-LOO for each model:
Model 1
loo1 = psis_loo(model_LBA(data, 2.0), chain_lba1)
[ Info: No source provided for samples; variables are assumed to be from a Markov Chain. If the samples are independent, specify this with keyword argument `source=:other`.
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1000 data points. Total Monte Carlo SE of 0.14.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ cv_elpd │ 359.33 │ 29.11 │ 0.36 │ 0.03 │
│ naive_lpd │ 364.34 │ 28.90 │ 0.36 │ 0.03 │
│ p_eff │ 5.01 │ 0.32 │ 0.01 │ 0.00 │
└───────────┴────────┴──────────┴───────┴─────────┘
Model 2
loo2 = psis_loo(model_LBA(data, 0.3), chain_lba2)
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1000 data points. Total Monte Carlo SE of 0.077.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ cv_elpd │ 378.04 │ 28.87 │ 0.38 │ 0.03 │
│ naive_lpd │ 381.63 │ 28.71 │ 0.38 │ 0.03 │
│ p_eff │ 3.59 │ 0.26 │ 0.00 │ 0.00 │
└───────────┴────────┴──────────┴───────┴─────────┘
Model 3
loo3 = psis_loo(model_LBA(data, 1.0), chain_lba3)
Results of PSIS-LOO-CV with 1000 Monte Carlo samples and 1000 data points. Total Monte Carlo SE of 0.14.
┌───────────┬────────┬──────────┬───────┬─────────┐
│ │ total │ se_total │ mean │ se_mean │
├───────────┼────────┼──────────┼───────┼─────────┤
│ cv_elpd │ 359.33 │ 29.11 │ 0.36 │ 0.03 │
│ naive_lpd │ 364.34 │ 28.90 │ 0.36 │ 0.03 │
│ p_eff │ 5.01 │ 0.32 │ 0.01 │ 0.00 │
└───────────┴────────┴──────────┴───────┴─────────┘
Compare Models
Finally, we can compare the models using the loo_compare
function:
loo_compare((lba1 = loo1, lba2 = loo2, lba3 = loo3))
┌──────┬─────────┬────────┬────────┐
│ │ cv_elpd │ cv_avg │ weight │
├──────┼─────────┼────────┼────────┤
│ lba2 │ 0.00 │ 0.00 │ 1.00 │
│ lba3 │ -18.71 │ -0.02 │ 0.00 │
│ lba1 │ -24.53 │ -0.02 │ 0.00 │
└──────┴─────────┴────────┴────────┘
Here we indeed correctly identified the generative model we simulated. It is of note that some researchers have criticized using model comparison metrics such as leave-one-out cross-validation. See Gronau et al. (2019) for more information.
References
Gronau, Q. F., & Wagenmakers, E. J. (2019). Limitations of Bayesian leave-one-out cross-validation for model selection. Computational brain & behavior, 2, 1-11.
Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 27, 1413–1432 (2017).