Implementing a new inference method¶
This tutorial provides the fundamentals for implementing custom parameter inference methods using ELFI. ELFI provides many features out of the box, such as parallelization or random state handling. In a typical case these happen “automatically” behind the scenes when the algorithms are built on top of the provided interface classes.
The base class for parameter inference classes is the ParameterInference interface
which is found from the elfi.methods.inference.parameter_inference
module. Among the methods in
the interface, those that must be implemented raise a NotImplementedError
. In
addition, you probably also want to override at least the update
and __init__
methods.
Let’s create an empty skeleton for a custom method that includes just the minimal set of methods to create a working algorithm in ELFI:
from elfi.methods.inference.parameter_inference import ParameterInference
class CustomMethod(ParameterInference):
def __init__(self, model, output_names, **kwargs):
super(CustomMethod, self).__init__(model, output_names, **kwargs)
def set_objective(self):
# Request 3 batches to be generated
self.objective['n_batches'] = 3
def extract_result(self):
return self.state
The method extract_result
is called by ELFI in the end of inference and should return
a ParameterInferenceResult
object (elfi.methods.result
module). For illustration
we will however begin by returning the member state
dictionary. It stores all the
current state information of the inference. Let’s make an instance of our method and run
it:
import elfi.examples.ma2 as ma2
# Get a ready made MA2 model to test our inference method with
m = ma2.get_model()
# We want the outputs from node 'd' of the model `m` to be available
custom_method = CustomMethod(m, ['d'])
# Run the inference
custom_method.infer() # {'n_batches': 3, 'n_sim': 3000}
Running the above returns the state dictionary. We will find a few keys in it that track
some basic properties of the state, such as the n_batches
telling how many batches has
been generated and n_sim
that tells the number of total simulations contained in those
batches. It should be n_batches
times the current batch size
(custom_method.batch_size
which was 1000 here by default).
You will find that the n_batches
in the state dictionary had a value 3. This is
because in our CustomMethod.set_objective
method, we set the n_batches
key of the
objective dictionary to that value. Every ParameterInference instance has a Python
dictionary called objective
that is a counterpart to the state
dictionary. The
objective defines the conditions when the inference is finished. The default controlling
key in that dictionary is the string n_batches
whose value tells ELFI how many batches
we need to generate in total from the provided generative ElfiModel
model. Inference
is considered finished when the n_batches
in the state
matches or exceeds that in
the objective
. The generation of batches is automatically parallelized in the
background, so we don’t have to worry about it.
Note
A batch in ELFI is a dictionary that maps names of nodes of the generative model to their outputs. An output in the batch consists of one or more runs of it’s operation stored to a numpy array. Each batch has an index, and the outputs in the same batch are guaranteed to be the same if you recompute the batch.
The algorithm, however, does nothing else at this point besides generating the 3 batches.
To actually do something with the batches, we can add the update
method that allows us
to update the state dictionary of the inference with any custom values. It takes in the
generated batch
dictionary and it’s index and is called by ELFI every time a new batch
is received. Let’s say we wish to filter parameters by a threshold (as in ABC Rejection
sampling) from the total number of simulations:
class CustomMethod(ParameterInference):
def __init__(self, model, output_names, **kwargs):
super(CustomMethod, self).__init__(model, output_names, **kwargs)
# Hard code a threshold and discrepancy node name for now
self.threshold = .1
self.discrepancy_name = output_names[0]
# Prepare lists to push the filtered outputs into
self.state['filtered_outputs'] = {name: [] for name in output_names}
def update(self, batch, batch_index):
super(CustomMethod, self).update(batch, batch_index)
# Make a filter mask (logical numpy array) from the distance array
filter_mask = batch[self.discrepancy_name] <= self.threshold
# Append the filtered parameters to their lists
for name in self.output_names:
values = batch[name]
self.state['filtered_outputs'][name].append(values[filter_mask])
... # other methods as before
m = ma2.get_model()
custom_method = CustomMethod(m, ['d'])
custom_method.infer() # {'n_batches': 3, 'n_sim': 3000, 'filtered_outputs': ...}
After running this you should have in the returned state dictionary the
filtered_outputs
key containing filtered distances for node d
from the 3 batches.
Note
The reason for the imposed structure in ParameterInference
is to encourage a
design where one can advance the inference iteratively using the iterate
method.
This makes it possible to stop at any point, check the current state and to be able to
continue. This is important as there are usually many moving parts, such as summary
statistic choices or deciding a good discrepancy function.
Now to be useful, we should allow the user to set the different options - the 3 batches is
not going to take her very far. The user also probably thinks in terms of simulations
rather than batches. ELFI allows you to replace the n_batches
with n_sim
key
in the objective to spare you from turning n_sim
to n_batches
in the code. Just
note that the n_sim
in the state will always be in multiples of the batch_size
.
Let’s modify the algorithm so, that the user can pass the threshold, the name of the discrepancy node and the number of simulations. And let’s also add the parameters to the outputs:
class CustomMethod(ParameterInference):
def __init__(self, model, discrepancy_name, threshold, **kwargs):
# Create a name list of nodes whose outputs we wish to receive
output_names = [discrepancy_name] + model.parameter_names
super(CustomMethod, self).__init__(model, output_names, **kwargs)
self.threshold = threshold
self.discrepancy_name = discrepancy_name
# Prepare lists to push the filtered outputs into
self.state['filtered_outputs'] = {name: [] for name in output_names}
def set_objective(self, n_sim):
self.objective['n_sim'] = n_sim
... # other methods as before
# Run it
custom_method = CustomMethod(m, 'd', threshold=.1, batch_size=1000)
custom_method.infer(n_sim=2000) # {'n_batches': 2, 'n_sim': 2000, 'filtered_outputs': ...}
Calling the inference method now returns the state dictionary that has also the filtered
parameters in it from each of the batches. Note that any arguments given to the infer
method are passed to the set_objective
method.
Now due to the structure of the algorithm the user can immediately continue from this state:
# Continue inference from the previous state (with n_sim=2000)
custom_method.infer(n_sim=4000) # {'n_batches': 4, 'n_sim': 4000, 'filtered_outputs': ...}
# Or use it iteratively
custom_method.set_objective(n_sim=6000)
custom_method.iterate()
assert custom_method.finished == False
# Investigate the current state
custom_method.extract_result() # {'n_batches': 5, 'n_sim': 5000, 'filtered_outputs': ...}
self.iterate()
assert custom_method.finished
custom_method.extract_result() # {'n_batches': 6, 'n_sim': 6000, 'filtered_outputs': ...}
This works, because the state is stored into the custom_method
instance, and we only
change the objective. Also ELFI calls iterate
internally in the infer
method.
The last finishing touch to our algorithm is to convert the state
dict to a more user
friendly format in the extract_result
method. First we want to convert the list of
filtered arrays from the batches to a numpy array. We will then wrap the result to a
elfi.methods.results.Sample
object and return it instead of the state
dict. Below
is the final complete implementation of our inference method class:
import numpy as np
from elfi.methods.inference.parameter_inference import ParameterInference
from elfi.methods.results import Sample
class CustomMethod(ParameterInference):
def __init__(self, model, discrepancy_name, threshold, **kwargs):
# Create a name list of nodes whose outputs we wish to receive
output_names = [discrepancy_name] + model.parameter_names
super(CustomMethod, self).__init__(model, output_names, **kwargs)
self.threshold = threshold
self.discrepancy_name = discrepancy_name
# Prepare lists to push the filtered outputs into
self.state['filtered_outputs'] = {name: [] for name in output_names}
def set_objective(self, n_sim):
self.objective['n_sim'] = n_sim
def update(self, batch, batch_index):
super(CustomMethod, self).update(batch, batch_index)
# Make a filter mask (logical numpy array) from the distance array
filter_mask = batch[self.discrepancy_name] <= self.threshold
# Append the filtered parameters to their lists
for name in self.output_names:
values = batch[name]
self.state['filtered_outputs'][name].append(values[filter_mask])
def extract_result(self):
filtered_outputs = self.state['filtered_outputs']
outputs = {name: np.concatenate(filtered_outputs[name]) for name in self.output_names}
return Sample(
method_name='CustomMethod',
outputs=outputs,
parameter_names=self.parameter_names,
discrepancy_name=self.discrepancy_name,
n_sim=self.state['n_sim'],
threshold=self.threshold
)
Running the inference with the above implementation should now produce an user friendly output:
Method: CustomMethod
Number of posterior samples: 82
Number of simulations: 10000
Threshold: 0.1
Posterior means: t1: 0.687, t2: 0.152
Where to go from here¶
When implementing your own method it is advisable to read the documentation of the
ParameterInference class. In addition we recommend reading the Rejection
, SMC
and/or BayesianOptimization
class implementations from the source for some more
advanced techniques. These methods feature e.g. how to inject values from outside into the
ELFI model (acquisition functions in BayesianOptimization), how to modify the user
provided model to get e.g. the pdf:s of the parameters (SMC) and so forth.
Good to know¶
ELFI guarantees that computing a batch with the same index will always produce the same
output given the same model and ComputationContext
object. The ComputationContext
object holds the batch size, seed for the PRNG, the pool object of precomputed batches
of nodes. If your method uses random quantities in the algorithm, please make sure
to use the seed attribute of ParameterInference
so that your results will be
consistent.
If you want to provide values for outputs of certain nodes from outside the generative
model, you can return them from prepare_new_batch
method. They will replace any
default value or operation in that node. This is used e.g. in BOLFI
where values from
the acquisition function replace values coming from the prior in the Bayesian optimization
phase.
The ParameterInference instance has also the following helper classes:
BatchHandler
¶
ParameterInference class instantiates a elfi.client.BatchHandler
helper class that
is set as the self.batches
member variable. This object is in essence a wrapper to the
Client
interface making it easier to work with batches that are in computation. Some
of the duties of BatchHandler
is to keep track of the current batch_index and of the
status of the batches that have been submitted. You often don’t need to interact with it
directly.
OutputPool
¶
elfi.store.OutputPool
serves a dual purpose:
1. It stores all the computed outputs of selected nodes
2. It provides those outputs when a batch is recomputed saving the need to recompute them.
Note however that reusing the values is not always possible. In sequential algorithms that decide their next parameter values based on earlier results, modifications to the ELFI model will invalidate the earlier data. On the other hand, Rejection sampling for instance allows changing any of the summaries or distances and still reuse e.g. the simulations. This is because all the parameter values will still come from the same priors.
Parameter inference base class¶
-
class
elfi.methods.inference.parameter_inference.
ParameterInference
(model, output_names, batch_size=1, seed=None, pool=None, max_parallel_batches=None)[source]¶ A base class for parameter inference methods.
-
model
¶ The ELFI graph used by the algorithm
Type: elfi.ElfiModel
-
output_names
¶ Names of the nodes whose outputs are included in the batches
Type: list
-
client
¶ The batches are computed in the client
Type: elfi.client.ClientBase
-
max_parallel_batches
¶ Type: int
-
state
¶ Stores any changing data related to achieving the objective. Must include a key
n_batches
for determining when the inference is finished.Type: dict
-
objective
¶ Holds the data for the algorithm to internally determine how many batches are still needed. You must have a key
n_batches
here. By default the algorithm finished when then_batches
in the state dictionary is equal or greater to the corresponding objective value.Type: dict
-
batches
¶ Helper class for submitting batches to the client and keeping track of their indexes.
Type: elfi.client.BatchHandler
-
pool
¶ Pool object for storing and reusing node outputs.
Type: elfi.store.OutputPool
Construct the inference algorithm object.
If you are implementing your own algorithm do not forget to call super.
Parameters: - model (elfi.ElfiModel) – Model to perform the inference with.
- output_names (list) – Names of the nodes whose outputs will be requested from the ELFI graph.
- batch_size (int, optional) – The number of parameter evaluations in each pass through the ELFI graph. When using a vectorized simulator, using a suitably large batch_size can provide a significant performance boost.
- seed (int, optional) – Seed for the data generation from the ElfiModel
- pool (elfi.store.OutputPool, optional) – OutputPool both stores and provides precomputed values for batches.
- max_parallel_batches (int, optional) – Maximum number of batches allowed to be in computation at the same time. Defaults to number of cores in the client
-
batch_size
¶ Return the current batch_size.
-
extract_result
()[source]¶ Prepare the result from the current state of the inference.
ELFI calls this method in the end of the inference to return the result.
Returns: result Return type: elfi.methods.result.Result
-
finished
¶ Check whether objective of n_batches have been reached.
-
infer
(*args, vis=None, bar=True, **kwargs)[source]¶ Set the objective and start the iterate loop until the inference is finished.
See the other arguments from the set_objective method.
Parameters: - vis (dict, optional) – Plotting options. More info in self.plot_state method
- bar (bool, optional) – Flag to remove (False) or keep (True) the progress bar from/in output.
Returns: result
Return type:
-
iterate
()[source]¶ Advance the inference by one iteration.
This is a way to manually progress the inference. One iteration consists of waiting and processing the result of the next batch in succession and possibly submitting new batches.
Notes
If the next batch is ready, it will be processed immediately and no new batches are submitted.
New batches are submitted only while waiting for the next one to complete. There will never be more batches submitted in parallel than the max_parallel_batches setting allows.
Returns: Return type: None
-
parameter_names
¶ Return the parameters to be inferred.
-
plot_state
(**kwargs)[source]¶ Plot the current state of the algorithm.
Parameters: - axes (matplotlib.axes.Axes (optional)) –
- figure (matplotlib.figure.Figure (optional)) –
- xlim – x-axis limits
- ylim – y-axis limits
- interactive (bool (default False)) – If true, uses IPython.display to update the cell figure
- close – Close figure in the end of plotting. Used in the end of interactive mode.
Returns: Return type: None
-
pool
Return the output pool of the inference.
-
prepare_new_batch
(batch_index)[source]¶ Prepare values for a new batch.
ELFI calls this method before submitting a new batch with an increasing index batch_index. This is an optional method to override. Use this if you have a need do do preparations, e.g. in Bayesian optimization algorithm, the next acquisition points would be acquired here.
If you need provide values for certain nodes, you can do so by constructing a batch dictionary and returning it. See e.g. BayesianOptimization for an example.
Parameters: batch_index (int) – next batch_index to be submitted Returns: batch – Keys should match to node names in the model. These values will override any default values or operations in those nodes. Return type: dict or None
-
seed
¶ Return the seed of the inference.
-
set_objective
(*args, **kwargs)[source]¶ Set the objective of the inference.
This method sets the objective of the inference (values typically stored in the self.objective dict).
Returns: Return type: None
-
update
(batch, batch_index)[source]¶ Update the inference state with a new batch.
ELFI calls this method when a new batch has been computed and the state of the inference should be updated with it. It is also possible to bypass ELFI and call this directly to update the inference.
Parameters: - batch (dict) – dict with self.outputs as keys and the corresponding outputs for the batch as values
- batch_index (int) –
Returns: Return type: None
-