2. Getting Started

pyLFI is a Python toolbox for Bayesian parameter estimation in models with intractable likelihood functions. By using Likelihood-Free Inference (LFI) schemes, in particular Approximate Bayesian Computation (ABC), pyLFI estimates the posterior distributions over model parameters. LFI is also known under the moniker Simulation-Based Inference (SBI).

Introduction

Mechanistic models aim to explain phenomena in terms of causal mechanisms, and candidate models are validated by investigating whether proposed mechanisms can explain how experimental data manifests. The mechanistic modelling is generally through the use of differential equations, and these models often have non-measurable parameters. A central challenge in building a mechanistic model is to identify the parametrization of the system which achieves an agreement between the model and experimental data.

Many mechanistic models are defined through simulators which describe how the process generates data. However, simulators are poorly suited for inference and lead to challenging inverse problems. Standard Bayesian inference is performed within the context of a statistical model from which the likelihood can be derived. Likelihoods are generally intractable or computationally infeasible for simulator models, which makes the typical approach to inference inaccessible.

LFI, or SBI, refers to a suite of algorithms that avoid explicit likelihood evaluations by instead using model simulations.

The ABC of Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) constitutes a class of computational sampling algorithms rooted in Bayesian statistics that bypass evaluation of the likelihood function. Given observed data \(y_\mathrm{obs}\), a simulator model \(\mathrm{M}(\theta)\) with parameters \(\theta\) having prior distributions \(\pi (\theta)\), ABC algorithms can be used to estimate the posterior distributions \(\pi (\theta \mid y_\mathrm{obs})\) over model parameters.

At its heart, the ABC approach is quite simple; evaluation of the likelihood is replaced by comparing simulated data (generated by the simulator model) to observed data, in order to assess how likely it is that the model could have produced the observed data.

Parameter Identification with pyLFI

pyLFI is made to be general and flexible so that it can accommodate other algorithms as well. The price to pay for the generality and flexibility is that the simulation of data and calculation of summary statistics from the data are left entirely to the user.

To perform parameter identification with pyLFI, there are generally four inputs that need to be specified:

  1. A simulator model. The mechanistic model needs to be specified through a simulator model that can generate simulated data \(y_\mathrm{sim}\) for any parameters \(\theta\). The simulator must be a Python callable.

  2. A summary statistics calculator. The ABC algorithms require the use of low-dimensional summary statistics \(s = S(y)\) calculated from the raw data \(y\). The summary statistics calculator must be a Python callable.

  3. Observed data \(y_\mathrm{obs}\). This must be on the same form as \(y_\mathrm{sim}\).

  4. A prior \(\pi (\theta)\) for each unknown parameter that describes the range of possible parameter values. Priors must be pylfi.Prior objects.

Simulators or summary statistic calculators not written in Python can be used as long as they can be wrapped in a Python function or class __call__ method.