Binomial Example
This simple example using a binomial model will guide you through the process of performing Bayesin parameter estimation with Differential Evolution MCMC. Suppose we observe $N$ samples from a random binomial process and observe $k$ successes. Formally, this can be stated as:
\[k \sim \mathrm{Binomial}(N, \theta),\]
where $\theta$ is a parameter representing the probability of success. Our goal is to estimate the probability of success $\theta$ from the $k$. Let's assume the prior distribution of $theta$ is given by:
\[\theta \sim \mathrm{beta}(1, 1).\]
In this simple case, the posterior distribution of $\theta$ has a simple closed-form solution:
\[\theta_{|k} \sim \mathrm{beta}(1 + k, 1 + N - k).\]
We will use this fact to verify that the MCMC sampler is working correctly.
Load Packages
Our first step is to load the required packages including StatsPlots.jl
for plotting the MCMC chain.
using DifferentialEvolutionMCMC
using Random
using Distributions
using StatsPlots
Random.seed!(88484)
Random.TaskLocalRNG()
Define Data
Let's assume we make $N=10$ observations which produce $k=6$ successes.
data = (N = 10,k = 6)
(N = 10, k = 6)
The true posterior distribution is given by:
true_posterior = Beta(1 + 6, 1 + 4)
Distributions.Beta{Float64}(α=7.0, β=5.0)
The maximum likelihood estimate of $\theta$ is
\[\hat{\theta}_\mathrm{MLE} = \frac{k}{N} = 0.60\]
The mean of the posterior of $\theta$ will be slightly less because our prior distribution encodes one failure and one success:
mean(true_posterior)
0.5833333333333334
Define Prior Log Likelihood Function
The next step is to define a function that passes our parameter $\theta$ and evaluates the log likelihood. The function to compute the prior log likelihood is as follows:
prior_loglike(θ) = logpdf(Beta(1, 1), θ)
prior_loglike (generic function with 1 method)
Define Sample Prior Distribution
Each particle in DifferentialEvolution must be seeded with an initial value. To do so, we define a function that returns the initial value. A common approach is to use the prior distribution as it is intended to encode likely values of $\theta$. The function for sampling from the prior distribution of $\theta$ is given in the following code block:
sample_prior() = rand(Beta(1, 1))
sample_prior (generic function with 1 method)
Define Log Likelihood Function
Next, we define a function to compute the log likelihood of the data. The first argument must be the data followed by a separate argument for each parameter, maintaining the same order specified in prior_loglike
. In our case, we can write the log likelihood function as follows:
function loglike(data, θ)
(;N,k) = data
return logpdf(Binomial(N, θ), k)
end
loglike (generic function with 1 method)
Define Bounds and Parameter Names
We must define the lower and upper bounds of each parameter. The bounds are a Tuple
of tuples, where the $i^{\mathrm{th}}$ inner tuple contains the lower and upper bounds of the $i^{\mathrm{th}}$ parameter. Note that the parameters must be in the same order as specified in prior_loglike
and loglike
. In the present case, we only have a single parameter $\theta$:
bounds = ((0,1),)
((0, 1),)
We can also pass a Tuple
of names for each parameter, again in the same order as specified in prior_loglike
and loglike
:
names = (:θ,)
(:θ,)
Define Model Object
Now that we have defined the necessary components of our model, we will organize them into a DEModel
object as follows:
model = DEModel(;
sample_prior,
prior_loglike,
loglike,
data,
names)
DEModel{DifferentialEvolutionMCMC.var"#8#10"{typeof(Main.loglike), NamedTuple{(:N, :k), Tuple{Int64, Int64}}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Tuple{}}, DifferentialEvolutionMCMC.var"#7#9"{typeof(Main.prior_loglike)}, Tuple{Symbol}, typeof(Main.sample_prior)}(DifferentialEvolutionMCMC.var"#7#9"{typeof(Main.prior_loglike)}(Main.prior_loglike), DifferentialEvolutionMCMC.var"#8#10"{typeof(Main.loglike), NamedTuple{(:N, :k), Tuple{Int64, Int64}}, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, Tuple{}}(Main.loglike, (N = 10, k = 6), Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}(), ()), Main.sample_prior, (:θ,))
Define Sampler
Next, we will create a sampler with the constructor DE
. Here, we will pass the sample_prior
function and the variable bounds
, which constains the lower and upper bounds of each parameter. In addition, we will specify $1,000$ burnin samples, Np=3
particles per group (default 4 particles per group) and proposal noise of σ=.01
.
de = DE(;sample_prior, bounds, burnin=1000, Np=3, σ=.01)
DE{Tuple{Tuple{Int64, Int64}}, typeof(random_gamma), typeof(DifferentialEvolutionMCMC.mh_update!), typeof(compute_posterior!), typeof(sample), DifferentialEvolutionMCMC.var"#3#5", Vector{Bool}, Array{Float64, 3}}(4, 3, 1000, true, 0.1, 0.1, 0.001, 0.01, 1.0, 0.0, ((0, 1),), 0, 1, DifferentialEvolutionMCMC.random_gamma, DifferentialEvolutionMCMC.mh_update!, DifferentialEvolutionMCMC.compute_posterior!, StatsBase.sample, DifferentialEvolutionMCMC.var"#3#5"(), Bool[0], Array{Float64, 3}(undef, 0, 0, 0))
Estimate Parameter
The code block below runs the sampler for $2000$ iterations with each group of particles running on a separate thread. The progress bar is also set to display.
n_iter = 2000
chains = sample(model, de, MCMCThreads(), n_iter, progress=true)
Chains MCMC chain (1000×3×12 Array{Float64, 3}):
Iterations = 1:1:1000
Number of chains = 12
Samples per chain = 1000
parameters = θ
internals = acceptance, lp
Summary Statistics
parameters mean std mcse ess_bulk ess_tail rhat ⋯
Symbol Float64 Float64 Float64 Float64 Float64 Float64 ⋯
θ 0.5811 0.1393 0.0041 1139.6076 1963.7912 1.0061 ⋯
1 column omitted
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
θ 0.2997 0.4825 0.5857 0.6800 0.8424
Evaluation
Convergence
We will evaluate convergence using two methods. First, we will verify that $\hat{r} \leq 1.05$ in the chain summary above. Indeed, it is. Its also important to examine the trace plot for any evidence of non-stationarity, or getting stuck at the same value for long periods of time. In the left panel of the plot below, the samples for each chain display the characteristic behavior indicative of efficient sampling and convergence: they look like white noise, or a "hairy caterpillar". The plot on the right shows the posterior distribution of $\theta$ for each of the 12 chains.
plot(chains, grid=false)
Accuracy
Let's also see if the results are similar to the closed-form solution. Below, we see that the mean and standard deviation are similar to the chain summary in the Estimate Parameter
section, suggesting that the MCMC sampler is working as expected.
mean(true_posterior)
0.5833333333333334
std(true_posterior)
0.13673544235706117