pylfi.inferences.ABCBase

class pylfi.inferences.ABCBase(observation, simulator, stat_calc, priors, inference_scheme, log=True)[source]

Base class for Approximate Bayesian Computation algorithms.

Parameters
observationarray_like

The observed data \(y_\mathrm{obs}\).

simulatorcallable

The simulator model parametrized by unknown model parameters \(\theta\). Any regular Python callable, i.e., function or class with __call__ method can be used.

stat_calccallable

Summary statistics calculator. Must take the return from simulator as arguments. Any regular Python callable, i.e., function or class with __call__ method can be used.

priorslist of Prior

List containing the priors for the unknown model parameters. The order must be the same as the order of positional arguments in simulator.

inference_schemestr

Name of inference scheme. To be passed by sub-class in call to base constructor.

logbool

Whether logger should be displayed or not. Default: True.

Attributes

epsilon

Tolerance parameter.

quantile

The chosen q-quantile.

stat_scale

The summary statistic scales.

Methods

batches(n_samples, n_jobs, seed[, force_equal])

Divide and conquer.

distance(s_sim, s_obs[, weight, scale])

Weighted Euclidean distance.

journal()

To be overwritten by sub-class: method to write and return Journal instance.

pilot_study([n_sim, quantile, stat_scale, ...])

Perform pilot study.

reg_adjust([method, transform, kernel, ...])

Post-sampling regression adjustment.

sample()

To be overwritten by sub-class: should implement sampling from inference scheme and optionally return Journal via the boolean return_journal keyword argument.

abstract sample()[source]

To be overwritten by sub-class: should implement sampling from inference scheme and optionally return Journal via the boolean return_journal keyword argument.

The sample method needs specific attribute names. See RejABC class for an example.

abstract journal()[source]

To be overwritten by sub-class: method to write and return Journal instance.

See RejABC class for an example.

Returns
Journal

Journal with results and information created by the run of inference schemes.

distance(s_sim, s_obs, weight=1.0, scale=1.0)[source]

Weighted Euclidean distance.

Computes the weighted Euclidean distance between two 1-D arrays of summary statistics. The weight parameter can be set to weight the importance of a summary statistic, whereas the scale parameter can be used to scale particular summary statistics in order to avoid dominance.

Parameters
s_sim{int, float}, array_like

Simulated summary statistic(s).

s_obs{int, float}, array_like

Observed summary statistic(s).

weight{int, float}, numpy.ndarray, optional

Importance weight(s) of summary statistic(s). Should sum to 1. Default: 1.0.

scale{int, float}, numpy.ndarray, optional

Scale weight(s) of summary statistic(s). Default: 1.0.

Returns
distancefloat

The weighted Euclidean distance between simulated and observed summary statistics.

batches(n_samples, n_jobs, seed, force_equal=False)[source]

Divide and conquer.

Divide the number of samples, n_samples, between n_jobs workers. If the ABC algorithm uses Markov chains, then n_jobs is the number chains. If n_jobs exceeds the number of available CPUs found by Pathos (this might include hardware threads), n_jobs is set to the number found by Pathos.

The method also creates a seed for each worker based on the input seed. This ensures reproducibility if wanted. If seed=None, the results will be stochastic between each run of the sampler.

When using Markov chains, the force_equal keyword must be set to True, in order to enforce equal length of chains. The n_samples that results in equal chain lengths will be found internally and returned. The corrected n_samples should be used downstream if needed.

Parameters
n_samplesint

Number of (posterior) samples to draw.

n_jobsint

Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those).

seedint

User-provided seed.

force_equalbool, optional

If True, n_samples will be adjusted to enforce equal number of tasks for each worker. Default: False.

Returns
n_samplesint

Possibly corrected number of samples (posterior draws).

n_jobsint

Possibly corrected number of processes (workers).

tasksint

The number of tasks for each parallel pool worker.

seedslist

Initial states for parallel pool workers.

pilot_study(n_sim=500, quantile=None, stat_scale=None, stat_weight=1.0, n_jobs=- 1, seed=None)[source]

Perform pilot study.

The pilot study runs the simulator n_sim times and sets the threshold parameter epsilon automatically as the q-quantile of simulated distances from the prior predictive distribution. For instance, the 0.5-quantile (the median) will give a threshold that accepts 50% of the simulations.

The pilot study can also be used to provide an estimate of the stat_scale parameter, used in the weighted Euclidean distance, from the prior predictive distribution by passing the stat_scale keyword as sd or mad. The stat_scale parameter is used to avoid dominance of particular summary statistics. stat_scale=sd scales the summary statistics according to their standard deviation (SD) estimated from the prior predictive samples, and stat_scale=mad according to their median absolute deviation (MAD).

It is important to note that if more than 50% of the prior predictive samples for a particular summary statistic have identical values, MAD will equal zero. In this case, the logger will raise a warning and the scale for the particular summary statistic will be set to SD instead. If there is no variability at all, the scale will be set to 1.

It is recommended to check that there are not too many identical samples before setting the scale, to avoid surprises.

Parameters
n_simsint, optional

Number of simulator runs.

quantileint

Quantile of the Euclidean distances.

stat_scalestr, optional

Summary statistics scale to estimate; can be set as either sd (standard deviation) or mad (median absolute deviation). If None, scale is set to 1.0. Default: None.

stat_weight{int, float}, numpy.ndarray, optional

Importance weights of summary statistics. Default: 1.0.

n_jobsint, optional

Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those). Default: -1.

seedint

User-provided seed. Will be used to generate seed for each worker. Default: None.

reg_adjust(method='loclinear', transform=True, kernel='epkov', return_journal=False)[source]

Post-sampling regression adjustment.

Regresses summary statistics on the obtained posterior samples, and corrects the posterior samples for the trend in the relationship.

Implementation based on Beaumont et al. (2002) and Blum (2017).

Parameters
methodstr, optional
The regression method to use:
  • linear: ordinary least squares regression;

  • loclinar: local linear regression (default).

transformbool, optional

If True, parameters are regressed on a logarithm scale. A log transformation will both stabilize the variance of the regression model and make it more homoscedastic. Note that the parameters must be positive in order to do a log transform. Default: True.

kernelstr, optional
The smoothing kernel function to use in local regression:
  • gaussian: Gaussian smoothing kernel;

  • epkov: Epanechnikov smoothing kernel.

return_journalbool, optional

If True, Journal is returned. Default: False.

References

M. Beaumont, W. Zhang and D.Balding. “Approximate Bayesian Computation in Population Genetics” GENETICS December 1, 2002 vol. 162 no. 4 2025-2035

M. Blum. “Regression approaches for approximate bayesian computation”. arXiv preprint arXiv:1707.01254, 2017 - arxiv.org

property epsilon

Tolerance parameter.

Returns
float
property quantile

The chosen q-quantile.

Returns
float
property stat_scale

The summary statistic scales.

Returns
numpy.ndarray