Example
In this simple example, parameter space partitioning is applied to a cube with regions of equal volume.
Load Dependencies
The first step is to the load dependences into your session.
using Distributions
using ParameterSpacePartitions
using ParameterSpacePartitions.TestModels
using Random
using DataFrames
Random.seed!(544)
Random.TaskLocalRNG()
Model Function
Next, we define a model that accepts a vector of parameters and returns data predictions. The function model
is imported above from TestModels
, but is presented below for illustration. In this simple example, the model returns the parameter inputs. In more substantive model, the returned value will be the model predictions.
model(parms, args...; kwargs...) = parms
model (generic function with 1 method)
Pattern Function
A second function categorizes the predicted data into a qualitative pattern. At minimum, the pattern function must recieve a data input. In this example, the pattern function p_fun
is imported from TestModels
. The function is presented below for illustration. p_fun
requires the location (i.e. parameters) and HyperCube
object, which contains partition boundaries (which is the same for each dimension). p_fun
categorizes location
on each dimension and returns a vector representing a pattern. For example, the pattern [1,2,1]
indicates the location
is in the partition for which the x axis is 1, the y axis is 2, and the z axis is 1.
function p_fun(location, hypercube::HyperCube, args...; kwargs...)
p_bounds = hypercube.p_bounds
nb = length(p_bounds)
nd = length(location)
vals = fill(-100, nd)
for j in 1:nd
for i in 1:(nb-1)
if (location[j] ≥ p_bounds[i]) && (location[j] ≤ p_bounds[i+1])
vals[j] = i
continue
end
end
end
return vals
end
p_fun (generic function with 1 method)
Model Configuration
Below, we will set n_dims
and n_part
to create a cube with a total of 2^3 = 8 regions.
# dimensions of the hypbercue
n_dims = 3
# partitions per dimension
n_part = 2
2
Boundaries
In the code below, bounds
contains the upper and lower boundaries for each dimension. In addition, we must also define a HyperCube
object containing the boundaries for each partition. The boundaries are stored in a variable called p_bounds
. In this example, the partitions are equally spaced along each dimension of the unit cube. Alternatively, we could use unequal spacing to increase the difficulty of exploring the parameter space.
# partition boundaries
bounds = fill((0, 1), n_dims)
p_bounds = range(0, 1, n_part + 1)
hypercube = HyperCube(p_bounds)
ParameterSpacePartitions.TestModels.HyperCube([0.0, 0.5, 1.0])
Starting Points
The sampling algorim requires a starting point to begin exploring the parameter space. The starting points must be wrapped in a vector. The starting points are sampled uniformly within the unit cube, using bounds
to ensure the starting point is within allowable ranges. Although one starting point is sufficient for the present example, seeding the sampling algorithm with multiple starting points can improve performance.
# number of starting points
n_start = 1
# sample function
sample(bounds) = map(b -> rand(Uniform(b...)), bounds)
# initial starting points
init_parms = map(_ -> sample(bounds), 1:n_start)
parm_names = [:p1,:p2,:p3]
3-element Vector{Symbol}:
:p1
:p2
:p3
Option Configuration
We can configure options to define and improve the performance of the algorithm. The search radius is an important configuration. The challenge is to balance the tradeoff between exploration and exploitation. If the radius is too small, it will stay in one region (or a sub-space of a region), and will fail to find new regions. By contrast, if the radius is too large, many regions will be found, but will not be well defined. Pitt et al. noted that an acceptance rate of 20% may work well in many cases, but this is a heuristic rather than a hard requirement. The options also stores the bounds and initial parameters. Threading can be enabled by setting parallel=true
. Some exploration revealed that threading becomes advantageous once the evaluation time reaches 1 millisecond or longer. Otherwise, threading overhead will reduce the speed of the algorithm.
options = Options(;
radius = .10,
bounds,
n_iters = 1000,
init_parms
)
Options{Bool, Float64, Vector{Tuple{Int64, Int64}}, Vector{Int64}, Int64, typeof(ParameterSpacePartitions.eval_patterns), ParameterSpacePartitions.var"#14#16"{typeof(adapt!), Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, Vector{Vector{Float64}}, Int64, Vector{Symbol}, Int64, Int64}(false, 0.1, [(0, 1), (0, 1), (0, 1)], [1, 1, 1], 1000, ParameterSpacePartitions.eval_patterns, ParameterSpacePartitions.var"#14#16"{typeof(adapt!), Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}(ParameterSpacePartitions.adapt!, Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}()), [[0.30380893217827787, 0.2383018801670267, 0.12387181523971857]], 3, [:p1, :p2, :p3], 0, 0)
It is also possible to pass a custom adaption function via the keyword adapt_radius!
. By default, the adaption function adjusts the radius to achieve a 40% acceptance rate. Additional information for configuring the default adaption function can be found via the help feature:
? adapt!
Find Partitions
Now that the requisite functions and options have been specified, we can now explore the parameter space. The function find_partitions
accepts the model
function, the pattern function p_fun
, the options object, and additional arguments and keyword arguments for p_fun
.
results = find_partitions(
model,
p_fun,
options,
hypercube,
)
first(results)
Row | chain_id | pattern | p1 | p2 | p3 | acceptance | radius |
---|---|---|---|---|---|---|---|
Int64 | Array… | Float64 | Float64 | Float64 | Bool | Float64 | |
1 | 3 | [1, 2, 2] | 0.387997 | 0.540261 | 0.544291 | true | 0.1 |
results
is a vector of Results
objects containing the following information:
iter
: the iteration of the algorithmchain_id
: an index of the chain, i.e. 2 is the second chainparms
: a vector of parameterspattern
: the target pattern of the chainacceptance
: a vector indicating whether a proposal was accepted. If accepted,parms
is the accepted proposal. If not accepted,parms
is the same as the previous iteration.
Results
The next code block finds the minimum and maximum of each partition.
df = DataFrame(results)
groups = groupby(df, :pattern)
summary = combine(groups,
:p1 => minimum, :p1 => maximum,
:p2 => minimum, :p2 => maximum,
:p3 => minimum, :p3 => maximum
) |> sort
Row | pattern | p1_minimum | p1_maximum | p2_minimum | p2_maximum | p3_minimum | p3_maximum |
---|---|---|---|---|---|---|---|
Array… | Float64 | Float64 | Float64 | Float64 | Float64 | Float64 | |
1 | [1, 1, 1] | 0.000328433 | 0.499494 | 0.000398249 | 0.499869 | 0.00316608 | 0.499197 |
2 | [1, 1, 2] | 0.00724318 | 0.499225 | 0.00126221 | 0.499246 | 0.500418 | 0.999058 |
3 | [1, 2, 1] | 0.000643857 | 0.499773 | 0.501071 | 0.999353 | 0.000408015 | 0.499862 |
4 | [1, 2, 2] | 0.00107225 | 0.499707 | 0.500181 | 0.997551 | 0.500511 | 0.999968 |
5 | [2, 1, 1] | 0.501703 | 0.996742 | 0.00484655 | 0.499582 | 8.36059e-5 | 0.499978 |
6 | [2, 1, 2] | 0.501122 | 0.999319 | 0.000114055 | 0.499156 | 0.500959 | 0.997596 |
7 | [2, 2, 1] | 0.500822 | 0.999124 | 0.50094 | 0.999829 | 2.6546e-5 | 0.499569 |
8 | [2, 2, 2] | 0.504304 | 0.999564 | 0.501617 | 0.998069 | 0.500387 | 0.999823 |
Volume Estimation
The function estimate_volume
approximates the volume of each region using an ellipsoid based on the covariance of sampled points in the region. As expected, the volume percentage estimates are close to 1/8 = .125.
groups = groupby(df, :pattern)
df_volume = combine(
groups,
x -> estimate_volume(
model,
p_fun,
x,
bounds,
hypercube;
parm_names,
)
)
df_volume.volume = df_volume.x1 / sum(df_volume.x1)
8-element Vector{Float64}:
0.12445824483220003
0.12672956237345967
0.1250359295395628
0.12328719494749203
0.12374104422289106
0.12469657572233146
0.13307144291329803
0.11898000544876494