Example

In this simple example, parameter space partitioning is applied to a cube with regions of equal volume.

Load Dependencies

The first step is to the load dependences into your session.

using Distributions
using ParameterSpacePartitions
using ParameterSpacePartitions.TestModels
using Random
using DataFrames
Random.seed!(544)

Random.TaskLocalRNG()

Next, we define a model that accepts a vector of parameters and returns data predictions. The function model is imported above from TestModels, but is presented below for illustration. In this simple example, the model returns the parameter inputs. In more substantive model, the returned value will be the model predictions.

model(parms, args...; kwargs...) = parms

model (generic function with 1 method)

Pattern Function

A second function categorizes the predicted data into a qualitative pattern. At minimum, the pattern function must recieve a data input. In this example, the pattern function p_fun is imported from TestModels. The function is presented below for illustration. p_fun requires the location (i.e. parameters) and HyperCube object, which contains partition boundaries (which is the same for each dimension). p_fun categorizes location on each dimension and returns a vector representing a pattern. For example, the pattern [1,2,1] indicates the location is in the partition for which the x axis is 1, the y axis is 2, and the z axis is 1.

function p_fun(location, hypercube::HyperCube, args...; kwargs...)
    p_bounds = hypercube.p_bounds
    nb = length(p_bounds)
    nd = length(location)
    vals = fill(-100, nd)
    for j in 1:nd
        for i in 1:(nb-1)
            if (location[j] ≥ p_bounds[i]) && (location[j] ≤ p_bounds[i+1])
                vals[j] = i
                continue
            end
        end
    end
    return vals
end

p_fun (generic function with 1 method)

Model Configuration

Below, we will set n_dims and n_part to create a cube with a total of 2^3 = 8 regions.

# dimensions of the hypbercue
n_dims = 3
# partitions per dimension
n_part = 2

Boundaries

In the code below, bounds contains the upper and lower boundaries for each dimension. In addition, we must also define a HyperCube object containing the boundaries for each partition. The boundaries are stored in a variable called p_bounds. In this example, the partitions are equally spaced along each dimension of the unit cube. Alternatively, we could use unequal spacing to increase the difficulty of exploring the parameter space.

# partition boundaries
bounds = fill((0, 1), n_dims)
p_bounds = range(0, 1, n_part + 1)
hypercube = HyperCube(p_bounds)

ParameterSpacePartitions.TestModels.HyperCube([0.0, 0.5, 1.0])

Starting Points

The sampling algorim requires a starting point to begin exploring the parameter space. The starting points must be wrapped in a vector. The starting points are sampled uniformly within the unit cube, using bounds to ensure the starting point is within allowable ranges. Although one starting point is sufficient for the present example, seeding the sampling algorithm with multiple starting points can improve performance.

# number of starting points
n_start = 1
# sample function
sample(bounds) = map(b -> rand(Uniform(b...)), bounds)
# initial starting points
init_parms = map(_ -> sample(bounds), 1:n_start)
parm_names = [:p1,:p2,:p3]

3-element Vector{Symbol}:
 :p1
 :p2
 :p3

Option Configuration

We can configure options to define and improve the performance of the algorithm. The search radius is an important configuration. The challenge is to balance the tradeoff between exploration and exploitation. If the radius is too small, it will stay in one region (or a sub-space of a region), and will fail to find new regions. By contrast, if the radius is too large, many regions will be found, but will not be well defined. Pitt et al. noted that an acceptance rate of 20% may work well in many cases, but this is a heuristic rather than a hard requirement. The options also stores the bounds and initial parameters. Threading can be enabled by setting parallel=true. Some exploration revealed that threading becomes advantageous once the evaluation time reaches 1 millisecond or longer. Otherwise, threading overhead will reduce the speed of the algorithm.

options = Options(;
    radius = .10,
    bounds,
    n_iters = 1000,
    init_parms
)

Options(false, 0.1, [(0, 1), (0, 1), (0, 1)], [1, 1, 1], 1000, ParameterSpacePartitions.eval_patterns, ParameterSpacePartitions.var"#14#16"{typeof(adapt!), Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}}(ParameterSpacePartitions.adapt!, Base.Pairs{Symbol, Union{}, Tuple{}, @NamedTuple{}}()), [[0.30380893217827787, 0.2383018801670267, 0.12387181523971857]], 3, [:p1, :p2, :p3], 0, 0)

It is also possible to pass a custom adaption function via the keyword adapt_radius!. By default, the adaption function adjusts the radius to achieve a 40% acceptance rate. Additional information for configuring the default adaption function can be found via the help feature:

? adapt!

Find Partitions

Now that the requisite functions and options have been specified, we can now explore the parameter space. The function find_partitions accepts the model function, the pattern function p_fun, the options object, and additional arguments and keyword arguments for p_fun.

results = find_partitions(
    model,
    p_fun,
    options,
    hypercube,
)
first(results)

DataFrameRow (7 columns)

Row	chain_id	pattern	p1	p2	p3	acceptance	radius
	Int64	Array…	Float64	Float64	Float64	Bool	Float64
1	5	[2, 2, 2]	0.508784	0.609249	0.615742	true	0.1

results is a vector of Results objects containing the following information:

iter: the iteration of the algorithm
chain_id: an index of the chain, i.e. 2 is the second chain
parms: a vector of parameters
pattern: the target pattern of the chain
acceptance: a vector indicating whether a proposal was accepted. If accepted, parms is the accepted proposal. If not accepted, parms is the same as the previous iteration.

Results

The next code block finds the minimum and maximum of each partition.

df = DataFrame(results)
groups = groupby(df, :pattern)
summary = combine(groups,
    :p1 => minimum, :p1 => maximum,
    :p2 => minimum, :p2 => maximum,
    :p3 => minimum, :p3 => maximum
) |> sort

8×7 DataFrame

Row	pattern	p1_minimum	p1_maximum	p2_minimum	p2_maximum	p3_minimum	p3_maximum
	Array…	Float64	Float64	Float64	Float64	Float64	Float64
1	[1, 1, 1]	0.000699614	0.498769	0.00128179	0.499422	0.00173273	0.499121
2	[1, 1, 2]	0.00048797	0.499843	0.00111479	0.49942	0.500167	0.998902
3	[1, 2, 1]	0.00176001	0.499285	0.500945	0.99823	0.00200899	0.497425
4	[1, 2, 2]	0.00185899	0.499654	0.500045	0.999574	0.500592	0.999883
5	[2, 1, 1]	0.500259	0.995743	0.000203763	0.499582	0.0020597	0.499978
6	[2, 1, 2]	0.500138	0.995866	0.00101571	0.498828	0.500409	0.999958
7	[2, 2, 1]	0.500599	0.999166	0.50054	0.999018	2.6546e-5	0.498384
8	[2, 2, 2]	0.50006	0.99682	0.502409	0.9999	0.500387	0.997195

Volume Estimation

The function estimate_volume approximates the volume of each region using an ellipsoid based on the covariance of sampled points in the region. As expected, the volume percentage estimates are close to 1/8 = .125.

groups = groupby(df, :pattern)

df_volume = combine(
    groups,
    x -> estimate_volume(
        model,
        p_fun,
        x,
        bounds,
        hypercube;
        parm_names,
    )
)

df_volume.volume = df_volume.x1 / sum(df_volume.x1)

8-element Vector{Float64}:
 0.12252338872079413
 0.1231516583788911
 0.13170491050931526
 0.12827826180139731
 0.1312943950629638
 0.12075405957634228
 0.13414864496184842
 0.10814468098844769