Implementing a new inference method

This tutorial provides the fundamentals for implementing custom parameter inference methods using ELFI. ELFI provides many features out of the box, such as parallelization or random state handling. In a typical case these happen “automatically” behind the scenes when the algorithms are built on top of the provided interface classes.

The base class for parameter inference classes is the ParameterInference interface which is found from the elfi.methods.inference.parameter_inference module. Among the methods in the interface, those that must be implemented raise a NotImplementedError. In addition, you probably also want to override at least the update and __init__ methods.

Let’s create an empty skeleton for a custom method that includes just the minimal set of methods to create a working algorithm in ELFI:

from elfi.methods.inference.parameter_inference import ParameterInference

class CustomMethod(ParameterInference):

    def __init__(self, model, output_names, **kwargs):
        super(CustomMethod, self).__init__(model, output_names, **kwargs)

    def set_objective(self):
        # Request 3 batches to be generated
        self.objective['n_batches'] = 3

    def extract_result(self):
        return self.state

The method extract_result is called by ELFI in the end of inference and should return a ParameterInferenceResult object (elfi.methods.result module). For illustration we will however begin by returning the member state dictionary. It stores all the current state information of the inference. Let’s make an instance of our method and run it:

import elfi.examples.ma2 as ma2

# Get a ready made MA2 model to test our inference method with
m = ma2.get_model()

# We want the outputs from node 'd' of the model `m` to be available
custom_method = CustomMethod(m, ['d'])

# Run the inference
custom_method.infer()  # {'n_batches': 3, 'n_sim': 3000}

Running the above returns the state dictionary. We will find a few keys in it that track some basic properties of the state, such as the n_batches telling how many batches has been generated and n_sim that tells the number of total simulations contained in those batches. It should be n_batches times the current batch size (custom_method.batch_size which was 1000 here by default).

You will find that the n_batches in the state dictionary had a value 3. This is because in our CustomMethod.set_objective method, we set the n_batches key of the objective dictionary to that value. Every ParameterInference instance has a Python dictionary called objective that is a counterpart to the state dictionary. The objective defines the conditions when the inference is finished. The default controlling key in that dictionary is the string n_batches whose value tells ELFI how many batches we need to generate in total from the provided generative ElfiModel model. Inference is considered finished when the n_batches in the state matches or exceeds that in the objective. The generation of batches is automatically parallelized in the background, so we don’t have to worry about it.

Note

A batch in ELFI is a dictionary that maps names of nodes of the generative model to their outputs. An output in the batch consists of one or more runs of it’s operation stored to a numpy array. Each batch has an index, and the outputs in the same batch are guaranteed to be the same if you recompute the batch.

The algorithm, however, does nothing else at this point besides generating the 3 batches. To actually do something with the batches, we can add the update method that allows us to update the state dictionary of the inference with any custom values. It takes in the generated batch dictionary and it’s index and is called by ELFI every time a new batch is received. Let’s say we wish to filter parameters by a threshold (as in ABC Rejection sampling) from the total number of simulations:

class CustomMethod(ParameterInference):
    def __init__(self, model, output_names, **kwargs):
        super(CustomMethod, self).__init__(model, output_names, **kwargs)

        # Hard code a threshold and discrepancy node name for now
        self.threshold = .1
        self.discrepancy_name = output_names[0]

        # Prepare lists to push the filtered outputs into
        self.state['filtered_outputs'] = {name: [] for name in output_names}

    def update(self, batch, batch_index):
        super(CustomMethod, self).update(batch, batch_index)

        # Make a filter mask (logical numpy array) from the distance array
        filter_mask = batch[self.discrepancy_name] <= self.threshold

        # Append the filtered parameters to their lists
        for name in self.output_names:
            values = batch[name]
            self.state['filtered_outputs'][name].append(values[filter_mask])

    ... # other methods as before

m = ma2.get_model()
custom_method = CustomMethod(m, ['d'])
custom_method.infer()  # {'n_batches': 3, 'n_sim': 3000, 'filtered_outputs': ...}

After running this you should have in the returned state dictionary the filtered_outputs key containing filtered distances for node d from the 3 batches.

Note

The reason for the imposed structure in ParameterInference is to encourage a design where one can advance the inference iteratively using the iterate method. This makes it possible to stop at any point, check the current state and to be able to continue. This is important as there are usually many moving parts, such as summary statistic choices or deciding a good discrepancy function.

Now to be useful, we should allow the user to set the different options - the 3 batches is not going to take her very far. The user also probably thinks in terms of simulations rather than batches. ELFI allows you to replace the n_batches with n_sim key in the objective to spare you from turning n_sim to n_batches in the code. Just note that the n_sim in the state will always be in multiples of the batch_size.

Let’s modify the algorithm so, that the user can pass the threshold, the name of the discrepancy node and the number of simulations. And let’s also add the parameters to the outputs:

class CustomMethod(ParameterInference):
    def __init__(self, model, discrepancy_name, threshold, **kwargs):
        # Create a name list of nodes whose outputs we wish to receive
        output_names = [discrepancy_name] + model.parameter_names
        super(CustomMethod, self).__init__(model, output_names, **kwargs)

        self.threshold = threshold
        self.discrepancy_name = discrepancy_name

        # Prepare lists to push the filtered outputs into
        self.state['filtered_outputs'] = {name: [] for name in output_names}

    def set_objective(self, n_sim):
        self.objective['n_sim'] = n_sim

    ... # other methods as before

# Run it
custom_method = CustomMethod(m, 'd', threshold=.1, batch_size=1000)
custom_method.infer(n_sim=2000)  # {'n_batches': 2, 'n_sim': 2000, 'filtered_outputs': ...}

Calling the inference method now returns the state dictionary that has also the filtered parameters in it from each of the batches. Note that any arguments given to the infer method are passed to the set_objective method.

Now due to the structure of the algorithm the user can immediately continue from this state:

# Continue inference from the previous state (with n_sim=2000)
custom_method.infer(n_sim=4000) # {'n_batches': 4, 'n_sim': 4000, 'filtered_outputs': ...}

# Or use it iteratively
custom_method.set_objective(n_sim=6000)

custom_method.iterate()
assert custom_method.finished == False
# Investigate the current state
custom_method.extract_result()  # {'n_batches': 5, 'n_sim': 5000, 'filtered_outputs': ...}

self.iterate()
assert custom_method.finished
custom_method.extract_result()  # {'n_batches': 6, 'n_sim': 6000, 'filtered_outputs': ...}

This works, because the state is stored into the custom_method instance, and we only change the objective. Also ELFI calls iterate internally in the infer method.

The last finishing touch to our algorithm is to convert the state dict to a more user friendly format in the extract_result method. First we want to convert the list of filtered arrays from the batches to a numpy array. We will then wrap the result to a elfi.methods.results.Sample object and return it instead of the state dict. Below is the final complete implementation of our inference method class:

import numpy as np

from elfi.methods.inference.parameter_inference import ParameterInference
from elfi.methods.results import Sample


class CustomMethod(ParameterInference):
    def __init__(self, model, discrepancy_name, threshold, **kwargs):
        # Create a name list of nodes whose outputs we wish to receive
        output_names = [discrepancy_name] + model.parameter_names
        super(CustomMethod, self).__init__(model, output_names, **kwargs)

        self.threshold = threshold
        self.discrepancy_name = discrepancy_name

        # Prepare lists to push the filtered outputs into
        self.state['filtered_outputs'] = {name: [] for name in output_names}

    def set_objective(self, n_sim):
        self.objective['n_sim'] = n_sim

    def update(self, batch, batch_index):
        super(CustomMethod, self).update(batch, batch_index)

        # Make a filter mask (logical numpy array) from the distance array
        filter_mask = batch[self.discrepancy_name] <= self.threshold

        # Append the filtered parameters to their lists
        for name in self.output_names:
            values = batch[name]
            self.state['filtered_outputs'][name].append(values[filter_mask])

    def extract_result(self):
        filtered_outputs = self.state['filtered_outputs']
        outputs = {name: np.concatenate(filtered_outputs[name]) for name in self.output_names}

        return Sample(
            method_name='CustomMethod',
            outputs=outputs,
            parameter_names=self.parameter_names,
            discrepancy_name=self.discrepancy_name,
            n_sim=self.state['n_sim'],
            threshold=self.threshold
            )

Running the inference with the above implementation should now produce an user friendly output:

Method: CustomMethod
Number of posterior samples: 82
Number of simulations: 10000
Threshold: 0.1
Posterior means: t1: 0.687, t2: 0.152

Where to go from here

When implementing your own method it is advisable to read the documentation of the ParameterInference class. In addition we recommend reading the Rejection, SMC and/or BayesianOptimization class implementations from the source for some more advanced techniques. These methods feature e.g. how to inject values from outside into the ELFI model (acquisition functions in BayesianOptimization), how to modify the user provided model to get e.g. the pdf:s of the parameters (SMC) and so forth.

Good to know

ELFI guarantees that computing a batch with the same index will always produce the same output given the same model and ComputationContext object. The ComputationContext object holds the batch size, seed for the PRNG, the pool object of precomputed batches of nodes. If your method uses random quantities in the algorithm, please make sure to use the seed attribute of ParameterInference so that your results will be consistent.

If you want to provide values for outputs of certain nodes from outside the generative model, you can return them from prepare_new_batch method. They will replace any default value or operation in that node. This is used e.g. in BOLFI where values from the acquisition function replace values coming from the prior in the Bayesian optimization phase.

The ParameterInference instance has also the following helper classes:

`BatchHandler`

ParameterInference class instantiates a elfi.client.BatchHandler helper class that is set as the self.batches member variable. This object is in essence a wrapper to the Client interface making it easier to work with batches that are in computation. Some of the duties of BatchHandler is to keep track of the current batch_index and of the status of the batches that have been submitted. You often don’t need to interact with it directly.

`OutputPool`

elfi.store.OutputPool serves a dual purpose: 1. It stores all the computed outputs of selected nodes 2. It provides those outputs when a batch is recomputed saving the need to recompute them.

Note however that reusing the values is not always possible. In sequential algorithms that decide their next parameter values based on earlier results, modifications to the ELFI model will invalidate the earlier data. On the other hand, Rejection sampling for instance allows changing any of the summaries or distances and still reuse e.g. the simulations. This is because all the parameter values will still come from the same priors.

Parameter inference base class

class elfi.methods.inference.parameter_inference.ParameterInference(model, output_names, batch_size=1, seed=None, pool=None, max_parallel_batches=None)[source]

A base class for parameter inference methods.

model

The ELFI graph used by the algorithm

Type:: elfi.ElfiModel

output_names

Names of the nodes whose outputs are included in the batches

Type:: list

client

The batches are computed in the client

Type:: elfi.client.ClientBase

max_parallel_batches

Type:: int

state

Stores any changing data related to achieving the objective. Must include a key n_batches for determining when the inference is finished.

Type:: dict

objective

Holds the data for the algorithm to internally determine how many batches are still needed. You must have a key n_batches here. By default the algorithm finished when the n_batches in the state dictionary is equal or greater to the corresponding objective value.

Type:: dict

batches

Helper class for submitting batches to the client and keeping track of their indexes.

Type:: elfi.client.BatchHandler

pool

Pool object for storing and reusing node outputs.

Type:: elfi.store.OutputPool

Construct the inference algorithm object.

If you are implementing your own algorithm do not forget to call super.

Parameters:

model (elfi.ElfiModel) – Model to perform the inference with.
output_names (list) – Names of the nodes whose outputs will be requested from the ELFI graph.
batch_size (int, optional) – The number of parameter evaluations in each pass through the ELFI graph. When using a vectorized simulator, using a suitably large batch_size can provide a significant performance boost.
seed (int, optional) – Seed for the data generation from the ElfiModel
pool (elfi.store.OutputPool, optional) – OutputPool both stores and provides precomputed values for batches.
max_parallel_batches (int, optional) – Maximum number of batches allowed to be in computation at the same time. Defaults to number of cores in the client

property batch_size: Return the current batch_size.

extract_result()[source]

Prepare the result from the current state of the inference.

ELFI calls this method in the end of the inference to return the result.

Returns:: result
Return type:: elfi.methods.result.Result

property finished: Check whether objective of n_batches have been reached.

infer(*args, vis=None, bar=True, **kwargs)[source]

Set the objective and start the iterate loop until the inference is finished.

See the other arguments from the set_objective method.

Parameters:

vis (dict, optional) – Plotting options. More info in self.plot_state method
bar (bool, optional) – Flag to remove (False) or keep (True) the progress bar from/in output.

Returns:

result

Return type:

Sample

iterate()[source]

Advance the inference by one iteration.

This is a way to manually progress the inference. One iteration consists of waiting and processing the result of the next batch in succession and possibly submitting new batches.

Notes

If the next batch is ready, it will be processed immediately and no new batches are submitted.

New batches are submitted only while waiting for the next one to complete. There will never be more batches submitted in parallel than the max_parallel_batches setting allows.

Return type:: None

property parameter_names: Return the parameters to be inferred.

plot_state(**kwargs)[source]

Plot the current state of the algorithm.

Parameters:

axes (matplotlib.axes.Axes (optional)) –
figure (matplotlib.figure.Figure (optional)) –
xlim – x-axis limits
ylim – y-axis limits
interactive (bool (default False)) – If true, uses IPython.display to update the cell figure
close – Close figure in the end of plotting. Used in the end of interactive mode.

Return type:

None

property pool: Return the output pool of the inference.

prepare_new_batch(batch_index)[source]

Prepare values for a new batch.

ELFI calls this method before submitting a new batch with an increasing index batch_index. This is an optional method to override. Use this if you have a need do do preparations, e.g. in Bayesian optimization algorithm, the next acquisition points would be acquired here.

If you need provide values for certain nodes, you can do so by constructing a batch dictionary and returning it. See e.g. BayesianOptimization for an example.

Parameters:: batch_index (int) – next batch_index to be submitted
Returns:: batch – Keys should match to node names in the model. These values will override any default values or operations in those nodes.
Return type:: dict or None

property seed: Return the seed of the inference.

set_objective(*args, **kwargs)[source]

Set the objective of the inference.

This method sets the objective of the inference (values typically stored in the self.objective dict).

Return type:: None

update(batch, batch_index)[source]

Update the inference state with a new batch.

ELFI calls this method when a new batch has been computed and the state of the inference should be updated with it. It is also possible to bypass ELFI and call this directly to update the inference.

Parameters:

batch (dict) – dict with self.outputs as keys and the corresponding outputs for the batch as values
batch_index (int) –

Return type:

None

Implementing a new inference method

Where to go from here

Good to know

BatchHandler

OutputPool

Parameter inference base class

`BatchHandler`

`OutputPool`