Binomial Example

This simple example using a binomial model will guide you through the process of performing Bayesin parameter estimation with Differential Evolution MCMC. Suppose we observe $N$ samples from a random binomial process and observe $k$ successes. Formally, this can be stated as:

\[k \sim \mathrm{Binomial}(N, \theta),\]

where $\theta$ is a parameter representing the probability of success. Our goal is to estimate the probability of success $\theta$ from the $k$. Let's assume the prior distribution of $theta$ is given by:

\[\theta \sim \mathrm{beta}(1, 1).\]

In this simple case, the posterior distribution of $\theta$ has a simple closed-form solution:

\[\theta_{|k} \sim \mathrm{beta}(1 + k, 1 + N - k).\]

We will use this fact to verify that the MCMC sampler is working correctly.

Load Packages

Our first step is to load the required packages including StatsPlots.jl for plotting the MCMC chain.

using DifferentialEvolutionMCMC
using Random
using Distributions
using StatsPlots
Random.seed!(88484)

Random.TaskLocalRNG()

Define Data

Let's assume we make $N=10$ observations which produce $k=6$ successes.

data = (N = 10,k = 6)

(N = 10, k = 6)

The true posterior distribution is given by:

true_posterior = Beta(1 + 6, 1 + 4)

Distributions.Beta{Float64}(α=7.0, β=5.0)

The maximum likelihood estimate of $\theta$ is

\[\hat{\theta}_\mathrm{MLE} = \frac{k}{N} = 0.60\]

The mean of the posterior of $\theta$ will be slightly less because our prior distribution encodes one failure and one success:

mean(true_posterior)

0.5833333333333334

Define Prior Log Likelihood Function

The next step is to define a function that passes our parameter $\theta$ and evaluates the log likelihood. The function to compute the prior log likelihood is as follows:

prior_loglike(θ) = logpdf(Beta(1, 1), θ)

prior_loglike (generic function with 1 method)

Define Sample Prior Distribution

Each particle in DifferentialEvolution must be seeded with an initial value. To do so, we define a function that returns the initial value. A common approach is to use the prior distribution as it is intended to encode likely values of $\theta$. The function for sampling from the prior distribution of $\theta$ is given in the following code block:

sample_prior() = rand(Beta(1, 1))

sample_prior (generic function with 1 method)

Define Log Likelihood Function

Next, we define a function to compute the log likelihood of the data. The first argument must be the data followed by a separate argument for each parameter, maintaining the same order specified in prior_loglike. In our case, we can write the log likelihood function as follows:

function loglike(data, θ)
    (;N,k) = data
    return logpdf(Binomial(N, θ), k)
end

loglike (generic function with 1 method)

Define Bounds and Parameter Names

We must define the lower and upper bounds of each parameter. The bounds are a Tuple of tuples, where the $i^{\mathrm{th}}$ inner tuple contains the lower and upper bounds of the $i^{\mathrm{th}}$ parameter. Note that the parameters must be in the same order as specified in prior_loglike and loglike. In the present case, we only have a single parameter $\theta$:

bounds = ((0,1),)

((0, 1),)

We can also pass a Tuple of names for each parameter, again in the same order as specified in prior_loglike and loglike:

names = (:θ,)

(:θ,)

Define Model Object

Now that we have defined the necessary components of our model, we will organize them into a DEModel object as follows:

model = DEModel(;
    sample_prior,
    prior_loglike,
    loglike,
    data,
    names)

DEModel{DifferentialEvolutionMCMC.var"#8#10"{typeof(Main.loglike), NamedTuple{(:N, :k), Tuple{Int64, Int64}}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Tuple{}}, DifferentialEvolutionMCMC.var"#7#9"{typeof(Main.prior_loglike)}, Tuple{Symbol}, typeof(Main.sample_prior)}(DifferentialEvolutionMCMC.var"#7#9"{typeof(Main.prior_loglike)}(Main.prior_loglike), DifferentialEvolutionMCMC.var"#8#10"{typeof(Main.loglike), NamedTuple{(:N, :k), Tuple{Int64, Int64}}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Tuple{}}(Main.loglike, (N = 10, k = 6), Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}(), ()), Main.sample_prior, (:θ,))

Define Sampler

Next, we will create a sampler with the constructor DE. Here, we will pass the sample_prior function and the variable bounds, which constains the lower and upper bounds of each parameter. In addition, we will specify $1,000$ burnin samples, Np=3 particles per group (default 4 particles per group) and proposal noise of σ=.01.

de = DE(;sample_prior, bounds, burnin=1000, Np=3, σ=.01)

DE{Tuple{Tuple{Int64, Int64}}, typeof(random_gamma), typeof(DifferentialEvolutionMCMC.mh_update!), typeof(compute_posterior!), typeof(sample), DifferentialEvolutionMCMC.var"#3#5", Vector{Bool}, Array{Float64, 3}}(4, 3, 1000, true, 0.1, 0.1, 0.001, 0.01, 1.0, 0.0, ((0, 1),), 0, 1, DifferentialEvolutionMCMC.random_gamma, DifferentialEvolutionMCMC.mh_update!, DifferentialEvolutionMCMC.compute_posterior!, StatsBase.sample, DifferentialEvolutionMCMC.var"#3#5"(), Bool[0], Array{Float64, 3}(undef, 0, 0, 0))

Estimate Parameter

The code block below runs the sampler for $2000$ iterations with each group of particles running on a separate thread. The progress bar is also set to display.

n_iter = 2000
chains = sample(model, de, MCMCThreads(), n_iter, progress=true)

Chains MCMC chain (1000×3×12 Array{Float64, 3}):

Iterations        = 1:1:1000
Number of chains  = 12
Samples per chain = 1000
parameters        = θ
internals         = acceptance, lp

Summary Statistics
  parameters      mean       std      mcse    ess_bulk    ess_tail      rhat   ⋯
      Symbol   Float64   Float64   Float64     Float64     Float64   Float64   ⋯

           θ    0.5811    0.1393    0.0041   1139.6076   1963.7912    1.0061   ⋯
                                                                1 column omitted

Quantiles
  parameters      2.5%     25.0%     50.0%     75.0%     97.5%
      Symbol   Float64   Float64   Float64   Float64   Float64

           θ    0.2997    0.4825    0.5857    0.6800    0.8424

Evaluation

Convergence

We will evaluate convergence using two methods. First, we will verify that $\hat{r} \leq 1.05$ in the chain summary above. Indeed, it is. Its also important to examine the trace plot for any evidence of non-stationarity, or getting stuck at the same value for long periods of time. In the left panel of the plot below, the samples for each chain display the characteristic behavior indicative of efficient sampling and convergence: they look like white noise, or a "hairy caterpillar". The plot on the right shows the posterior distribution of $\theta$ for each of the 12 chains.

plot(chains, grid=false)

Accuracy

Let's also see if the results are similar to the closed-form solution. Below, we see that the mean and standard deviation are similar to the chain summary in the Estimate Parameter section, suggesting that the MCMC sampler is working as expected.

mean(true_posterior)

0.5833333333333334

std(true_posterior)

0.13673544235706117