ELFI architecture

Here we explain the internal representation of the ELFI model. This representation contains everything that is needed to generate data, but is separate from e.g. the inference methods or the data storages. This information is aimed for developers and is not essential for using ELFI. We assume the reader is quite familiar with Python and has perhaps already read some of ELFI’s source code.

The low level representation of the ELFI model is a networkx.DiGraph with node names as the nodes. The representation of the node is stored to the corresponding attribute dictionary of the networkx.DiGraph. We call this attribute dictionary the node state dictionary. The networkx.DiGraph representation can be found from ElfiModel.source_net. Before the ELFI model can be ran, it needs to be compiled and loaded with data (e.g. observed data, precomputed data, batch index, batch size etc). The compilation and loading of data is the responsibility of the Client implementation and makes it possible in essence to translate ElfiModel to any kind of computational backend. Finally the class Executor is responsible for running the compiled and loaded model and producing the outputs of the nodes.

A user typically creates this low level representation by working with subclasses of NodeReference. These are easy to use UI classes of ELFI such as the elfi.Simulator or elfi.Prior. Under the hood they create proper node state dictionaries stored into the source_net. The callables such as simulators or summaries that the user provides to these classes are called operations.

The model graph representation

The source_net is a directed acyclic graph (DAG) and holds the state dictionaries of the nodes and the edges between the nodes. An edge represents a dependency. For example and edge from a prior node to the simulator node represents that the simulator requires a value from the prior to be able to run. The edge name corresponds to a parameter name for the operation, with integer names interpreted as positional parameters.

In the standard compilation process, the source_net is augmented with additional nodes such as batch_size or random_state, that are then added as dependencies for those operations that require them. In addition the state dicts will be turned into either a runnable operation or a precomputed value.

The execution order of the nodes in the compiled graph follows the topological ordering of the DAG (dependency order) and is guaranteed to be the same every time. Note that because the default behaviour is that nodes share a random state, changing a node that uses a shared random state will affect the result of any later node in the ordering using the same random state, even if they would be independent based on the graph topology.

State dictionary

The state of a node is a Python dictionary. It describes the type of the node and any other relevant state information, such as the user provided callable operation (e.g. simulator or summary statistic) and any additional parameters the operation needs to be provided in the compilation.

The following are reserved keywords of the state dict that serve as instructions for the ELFI compiler. They begin with an underscore. Currently these are:

_operationcallable: Operation of the node producing the output. Can not be used if _output is present.
_outputvariable: Constant output of the node. Can not be used if _operation is present.
_classclass: The subclass of NodeReference that created the state.
_stochasticbool, optional: Indicates that the node is stochastic. ELFI will provide a random_state argument for such nodes, which contains a RandomState object for drawing random quantities. This node will appear in the computation graph. Using ELFI provided random states makes it possible to have repeatable experiments in ELFI.
_observablebool, optional: Indicates that there is observed data for this node or that it can be derived from the observed data. ELFI will create a corresponding observed node into the compiled graph. These nodes are dependencies of discrepancy nodes.
_uses_batch_sizebool, optional: Indicates that the node operation requires batch_size as input. A corresponding edge from batch_size node to this node will be added to the compiled graph.
_uses_metabool, optional: Indicates that the node operation requires meta information dictionary about the execution. This includes, model name, batch index and submission index. Useful for e.g. creating informative and unique file names. If the operation is vectorized with elfi.tools.vectorize, then also index_in_batch will be added to the meta information dictionary.
_uses_observedbool, optional: Indicates that the node requires the observed data of its parents in the source_net as input. ELFI will gather the observed values of its parents to a tuple and link them to the node as a named argument observed.
_parameterbool, optional: Indicates that the node is a parameter node

The compilation and data loading phases

The compilation of the computation graph is separated from the loading of the data for making it possible to reuse the compiled model. The subclasses of the Loader class take responsibility of injecting data to the nodes of the compiled model. Examples of injected data are precomputed values from the OutputPool, the current random_state and so forth.