pylfi.inferences.ABCBase¶

class pylfi.inferences.ABCBase(observation, simulator, stat_calc, priors, inference_scheme, log=True)[source]¶

Base class for Approximate Bayesian Computation algorithms.

Parameters

observationarray_like: The observed data \(y_\mathrm{obs}\).
simulatorcallable: The simulator model parametrized by unknown model parameters \(\theta\). Any regular Python callable, i.e., function or class with __call__ method can be used.
stat_calccallable: Summary statistics calculator. Must take the return from simulator as arguments. Any regular Python callable, i.e., function or class with __call__ method can be used.
priorslist of Prior: List containing the priors for the unknown model parameters. The order must be the same as the order of positional arguments in simulator.
inference_schemestr: Name of inference scheme. To be passed by sub-class in call to base constructor.
logbool: Whether logger should be displayed or not. Default: True.

Attributes

`epsilon`	Tolerance parameter.
`quantile`	The chosen q-quantile.
`stat_scale`	The summary statistic scales.

Methods

`batches`(n_samples, n_jobs, seed[, force_equal])	Divide and conquer.
`distance`(s_sim, s_obs[, weight, scale])	Weighted Euclidean distance.
`journal`()	To be overwritten by sub-class: method to write and return `Journal` instance.
`pilot_study`([n_sim, quantile, stat_scale, ...])	Perform pilot study.
`reg_adjust`([method, transform, kernel, ...])	Post-sampling regression adjustment.
`sample`()	To be overwritten by sub-class: should implement sampling from inference scheme and optionally return `Journal` via the boolean `return_journal` keyword argument.

abstract sample()[source]¶

To be overwritten by sub-class: should implement sampling from inference scheme and optionally return Journal via the boolean return_journal keyword argument.

The sample method needs specific attribute names. See RejABC class for an example.

abstract journal()[source]¶

To be overwritten by sub-class: method to write and return Journal instance.

See RejABC class for an example.

Returns

Journal: Journal with results and information created by the run of inference schemes.

distance(s_sim, s_obs, weight=1.0, scale=1.0)[source]¶

Weighted Euclidean distance.

Computes the weighted Euclidean distance between two 1-D arrays of summary statistics. The weight parameter can be set to weight the importance of a summary statistic, whereas the scale parameter can be used to scale particular summary statistics in order to avoid dominance.

Parameters

s_sim{int, float}, array_like: Simulated summary statistic(s).
s_obs{int, float}, array_like: Observed summary statistic(s).
weight{int, float}, numpy.ndarray, optional: Importance weight(s) of summary statistic(s). Should sum to 1. Default: 1.0.
scale{int, float}, numpy.ndarray, optional: Scale weight(s) of summary statistic(s). Default: 1.0.

Returns

distancefloat: The weighted Euclidean distance between simulated and observed summary statistics.

batches(n_samples, n_jobs, seed, force_equal=False)[source]¶

Divide and conquer.

Divide the number of samples, n_samples, between n_jobs workers. If the ABC algorithm uses Markov chains, then n_jobs is the number chains. If n_jobs exceeds the number of available CPUs found by Pathos (this might include hardware threads), n_jobs is set to the number found by Pathos.

The method also creates a seed for each worker based on the input seed. This ensures reproducibility if wanted. If seed=None, the results will be stochastic between each run of the sampler.

When using Markov chains, the force_equal keyword must be set to True, in order to enforce equal length of chains. The n_samples that results in equal chain lengths will be found internally and returned. The corrected n_samples should be used downstream if needed.

Parameters

n_samplesint: Number of (posterior) samples to draw.
n_jobsint: Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those).
seedint: User-provided seed.
force_equalbool, optional: If True, n_samples will be adjusted to enforce equal number of tasks for each worker. Default: False.

Returns

n_samplesint: Possibly corrected number of samples (posterior draws).
n_jobsint: Possibly corrected number of processes (workers).
tasksint: The number of tasks for each parallel pool worker.
seedslist: Initial states for parallel pool workers.

pilot_study(n_sim=500, quantile=None, stat_scale=None, stat_weight=1.0, n_jobs=- 1, seed=None)[source]¶

Perform pilot study.

The pilot study runs the simulator n_sim times and sets the threshold parameter epsilon automatically as the q-quantile of simulated distances from the prior predictive distribution. For instance, the 0.5-quantile (the median) will give a threshold that accepts 50% of the simulations.

The pilot study can also be used to provide an estimate of the stat_scale parameter, used in the weighted Euclidean distance, from the prior predictive distribution by passing the stat_scale keyword as sd or mad. The stat_scale parameter is used to avoid dominance of particular summary statistics. stat_scale=sd scales the summary statistics according to their standard deviation (SD) estimated from the prior predictive samples, and stat_scale=mad according to their median absolute deviation (MAD).

It is important to note that if more than 50% of the prior predictive samples for a particular summary statistic have identical values, MAD will equal zero. In this case, the logger will raise a warning and the scale for the particular summary statistic will be set to SD instead. If there is no variability at all, the scale will be set to 1.

It is recommended to check that there are not too many identical samples before setting the scale, to avoid surprises.

Parameters

n_simsint, optional: Number of simulator runs.
quantileint: Quantile of the Euclidean distances.
stat_scalestr, optional: Summary statistics scale to estimate; can be set as either sd (standard deviation) or mad (median absolute deviation). If None, scale is set to 1.0. Default: None.
stat_weight{int, float}, numpy.ndarray, optional: Importance weights of summary statistics. Default: 1.0.
n_jobsint, optional: Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those). Default: -1.
seedint: User-provided seed. Will be used to generate seed for each worker. Default: None.

reg_adjust(method='loclinear', transform=True, kernel='epkov', return_journal=False)[source]¶

Post-sampling regression adjustment.

Regresses summary statistics on the obtained posterior samples, and corrects the posterior samples for the trend in the relationship.

Implementation based on Beaumont et al. (2002) and Blum (2017).

Parameters

methodstr, optional

The regression method to use:

linear: ordinary least squares regression;
loclinar: local linear regression (default).

transformbool, optional

If True, parameters are regressed on a logarithm scale. A log transformation will both stabilize the variance of the regression model and make it more homoscedastic. Note that the parameters must be positive in order to do a log transform. Default: True.

kernelstr, optional

The smoothing kernel function to use in local regression:

gaussian: Gaussian smoothing kernel;
epkov: Epanechnikov smoothing kernel.

return_journalbool, optional

If True, Journal is returned. Default: False.

References

M. Beaumont, W. Zhang and D.Balding. “Approximate Bayesian Computation in Population Genetics” GENETICS December 1, 2002 vol. 162 no. 4 2025-2035

M. Blum. “Regression approaches for approximate bayesian computation”. arXiv preprint arXiv:1707.01254, 2017 - arxiv.org

property epsilon¶

Tolerance parameter.

Returns

float

property quantile¶

The chosen q-quantile.

Returns

float

property stat_scale¶

The summary statistic scales.

Returns

numpy.ndarray

API Reference

pylfi.inferences.RejABC