Adaptive distance

ABC provides means to sample an approximate posterior distribution over unknown parameters based on comparison between observed and simulated data. This comparison is often based on distance between features that summarise the data and are informative about the parameter values.

Here we assume that the summaries calculated based on observed and simulated data are compared based on weighted distance with weight \(w_i=1/\sigma_i\) calculated based on their standard deviation \(\sigma_i\). This ensures that the selected summaries to have an equal contribution in the distance between observed and simulated data.

This notebook studies adaptive distance SMC-ABC where \(\sigma_i\) and \(w_i\) are recalculated between SMC iterations as proposed in [1].

import numpy as np
import scipy.stats as ss
import matplotlib.pyplot as plt
%matplotlib inline
import elfi

Example 1:

Assume we have an unknown parameter with prior distribution \(\theta\sim U(0,50)\) and two simulator outputs \(S_1\sim N(\theta, 1)\) and \(S_2\sim N(\theta, 100)\) whose observed values are 20.

def simulator(mu, batch_size=1, random_state=None):

    batches_mu = np.asarray(mu).reshape((-1,1))

    obs_1 = ss.norm.rvs(loc=batches_mu, scale=1, random_state=random_state).reshape((-1,1))
    obs_2 = ss.norm.rvs(loc=batches_mu, scale=100, random_state=random_state).reshape((-1,1))

    return np.hstack((obs_1, obs_2))
observed_data = np.array([20,20])[None,:]

Here the simulator outputs are both informative about the unknown model parameter, but \(S_2\) has more observation noise than \(S_1\). We do not calculate separate summaries in this example, but compare observed and simulated data based on these two variables.

Euclidean distance between observed and simulated outputs or summaries can be used to find parameter values that could produce the observed data. Here we describe dependencies between the unknown parameter value and observed distances as an ELFI model m and sample the approximate posterior distribution with the rejection sampler.

m = elfi.new_model()
theta = elfi.Prior(ss.uniform, 0, 50, model=m)
sim = elfi.Simulator(simulator, theta, observed=observed_data)
d = elfi.Distance('euclidean', sim)
rej = elfi.Rejection(d, batch_size=10000, seed=123)

Let us sample 100 parameters with quantile=0.01. This means that we sample 10000 candidate parameters from the prior distribution and take the 100 parameters that produce simulated data closest to the observed data.

sample = rej.sample(100, quantile=0.01)
Progress [==================================================] 100.0% Complete
sample
Method: Rejection
Number of samples: 100
Number of simulations: 10000
Threshold: 6.66
Sample means: theta: 19.6
plt.hist(sample.samples_array,range=(0,50),bins=20)
plt.xlabel('theta');
https://raw.githubusercontent.com/elfi-dev/notebooks/dev/figures/adaptive_distance_files/adaptive_distance_12_0.png

The approximate posterior sample is concentrated around \(\theta=20\) as expected in this example. However the sample distribution is much wider than we would observe in case the sample was selected based on \(S_1\) alone.

Now let us test adaptive distance in the same example.

First we switch the distance node d to an adaptive distance node and initialise adaptive distance SMC-ABC. Initialisation is identical to the rejection sampler, and here we use the same batch size and seed as earlier, so that the methods are presented with the exact same candidate parameters.

d.become(elfi.AdaptiveDistance(sim))
ada_smc = elfi.AdaptiveDistanceSMC(d, batch_size=10000, seed=123)

Since this is an iterative method, we must decide both sample size (n_samples) and how many populations are sampled (rounds). In addition we can decide the \(\alpha\) quantile (quantile) used in estimation.

Each population with n_samples parameter values is sampled as follows: 1. n_samples/quantile parameters are sampled from the current proposal distribution with acceptance threshold determined based on the previous population and 2. the distance measure is updated based on the observed sample and n_samples with the smallest distance are selected as the new population. The first population is sampled from the prior distribution and all samples are accepted in step 1.

Here we sample one population with quantile=0.01. This means that the total simulation count will be the same as with the rejection sampler, but now the distance function is updated based on the 10000 simulated observations, and the 100 parameters included in the posterior sample are selected based on the new distance measure.

sample_ada = ada_smc.sample(100, 1, quantile=0.01)
ABC-SMC Round 1 / 1
Progress [==================================================] 100.0% Complete
sample_ada
Method: AdaptiveDistanceSMC
Number of samples: 100
Number of simulations: 10000
Threshold: 0.462
Sample means: theta: 19.8
plt.hist(sample_ada.samples_array,range=(0,50),bins=20)
plt.xlabel('theta');
https://raw.githubusercontent.com/elfi-dev/notebooks/dev/figures/adaptive_distance_files/adaptive_distance_19_0.png

We see that the posterior distribution over unknown parameter values is narrower than in the previous example. This is because the simulator outputs are now normalised based on their estimated standard deviation.

We can see \(w_1\) and \(w_2\):

sample_ada.adaptive_distance_w
[array([0.06940134, 0.0097677 ])]

Example 2:

This is the normal distribution example presented in [1].

Here we have an unknown parameter with prior distribution \(\theta\sim N(0,100)\) and two simulator outputs \(S_1\sim N(\theta, 0.1)\) and \(S_2\sim N(1, 1)\) whose observed values are 0.

def simulator(mu, batch_size=1, random_state=None):

    batches_mu = np.asarray(mu).reshape((-1,1))

    obs_1 = ss.norm.rvs(loc=batches_mu, scale=0.1, random_state=random_state).reshape((-1,1))
    obs_2 = ss.norm.rvs(loc=1, scale=1, size=batch_size, random_state=random_state).reshape((-1,1))

    return np.hstack((obs_1, obs_2))
observed_data = np.array([0,0])[None,:]

\(S_1\) is now informative and \(S_2\) uninformative about the unknown parameter value, and we note that between the two output variables, \(S_1\) has larger variance under the prior predictive distribution. This means that normalisation estimated based on output data observed in the initial round or based on a separate sample would not work well in this example.

Let us define a new model and initialise adaptive distance SMC-ABC.

m = elfi.new_model()
theta = elfi.Prior(ss.norm, 0, 100, model=m)
sim = elfi.Simulator(simulator, theta, observed=observed_data)
d = elfi.AdaptiveDistance(sim)
ada_smc = elfi.AdaptiveDistanceSMC(d, batch_size=2000, seed=123)

Next we sample 1000 parameter values in 5 rounds with the default quantile=0.5 which is recommended in sequential estimation [1]:

sample_ada = ada_smc.sample(1000, 5)
ABC-SMC Round 1 / 5
Progress [==================================================] 100.0% Complete
ABC-SMC Round 2 / 5
Progress [==================================================] 100.0% Complete
ABC-SMC Round 3 / 5
Progress [==================================================] 100.0% Complete
ABC-SMC Round 4 / 5
Progress [==================================================] 100.0% Complete
ABC-SMC Round 5 / 5
Progress [==================================================] 100.0% Complete
sample_ada
Method: AdaptiveDistanceSMC
Number of samples: 1000
Number of simulations: 32000
Threshold: 0.925
Sample means: theta: -0.195
plt.hist(sample_ada.samples_array, range=(-25,25), bins=20)
plt.xlabel(theta);
https://raw.githubusercontent.com/elfi-dev/notebooks/dev/figures/adaptive_distance_files/adaptive_distance_31_0.png

The sample distribution is concentrated around \(\theta=0\) but wider than could be expected. However we can continue the iterative estimation process. Here we sample two more populations:

sample_ada = ada_smc.sample(1000, 2)
ABC-SMC Round 6 / 7
Progress [==================================================] 100.0% Complete
ABC-SMC Round 7 / 7
Progress [==================================================] 100.0% Complete
sample_ada
Method: AdaptiveDistanceSMC
Number of samples: 1000
Number of simulations: 48000
Threshold: 0.868
Sample means: theta: 0.0183
plt.hist(sample_ada.samples_array, range=(-25,25), bins=20)
plt.xlabel('theta');
https://raw.githubusercontent.com/elfi-dev/notebooks/dev/figures/adaptive_distance_files/adaptive_distance_35_0.png

We observe that the sample mean is now closer to zero and the sample distribution is narrower.

Let us examine \(w_1\) and \(w_2\):

sample_ada.adaptive_distance_w
[array([0.01023228, 1.00584519]),
 array([0.00921258, 0.99287166]),
 array([0.01201937, 0.99365522]),
 array([0.02217631, 0.98925365]),
 array([0.04355987, 1.00076738]),
 array([0.07863284, 0.9971017 ]),
 array([0.13892778, 1.00929049])]

We can see that \(w_2\) (second column) is constant across iterations whereas \(w_1\) increases as the method learns more about possible parameter values and the proposal distribution becomes more concentrated around \(\theta=0\).

Notes

The adaptive distance SMC-ABC method demonstrated in this notebook normalises simulator outputs or summaries calculated based on simulator output based on their estimated standard deviation under the proposal distribution in each iteration. This ensures that all outputs or summaries have an equal contribution to the distance between simulated and observed data in all iterations.

It is important to note that the method does not evaluate whether outputs or summaries are needed or informative. In both examples studied in this notebook, results would improve if inference was carried out based on \(S_1\) alone. Hence one should choose the summaries used in adaptive distance SMC-ABC with the usual care. ELFI tools that aid in the selection process are discussed in the diagnostics notebook available here.

Reference

[1] Prangle D (2017). Adapting the ABC Distance Function. Bayesian Analysis 12(1): 289-309, 2017. https://projecteuclid.org/euclid.ba/1460641065