pylfi.inferences.MCMCABC¶

class pylfi.inferences.MCMCABC(observation, simulator, stat_calc, priors, log=True)[source]¶

Class implementing the MCMC ABC algorithm.

Attributes

`epsilon`	Tolerance parameter.
`quantile`	The chosen q-quantile.
`stat_scale`	The summary statistic scales.

prop_scale

Methods

`batches`(n_samples, n_jobs, seed[, force_equal])	Divide and conquer.
`distance`(s_sim, s_obs[, weight, scale])	Weighted Euclidean distance.
`journal`()	Create and return an instance of Journal class.
`pilot_study`([n_sim, quantile, stat_scale, ...])	Perform pilot study.
`reg_adjust`([method, transform, kernel, ...])	Post-sampling regression adjustment.
`sample`(n_samples[, epsilon, prop_scale, ...])	tune: bool
`tune`([prop_scale, epsilon, tune_iter, ...])	Tune the proposal scale

tune(prop_scale=0.5, epsilon=None, tune_iter=500, tune_interval=100, stat_weight=1.0, stat_scale=1.0, seed=None, use_pilot=False)[source]¶

Tune the proposal scale

So how do we choose sd for the proposal distribution? There are some papers that suggest Metropolis-Hastings is most efficient when you accept 23.4% of proposed samples, and it turns out that lowering step size increases the probability of accepting a proposal. PyMC3 will spend the first 500 steps increasing and decreasing the step size to try to find the best value of sd that will give you an acceptance rate of 23.4% (you can even set different acceptance rates).

The problem is that if you change the step size while sampling, you lose the guarantees that your samples (asymptotically) come from the target distribution, so you should typically discard these. Also, there is typically a lot more adaptation going on in those first steps than just step_size.

sample(n_samples, epsilon=None, prop_scale=0.5, burn=100, tune=True, tune_iter=500, tune_interval=100, stat_weight=1.0, stat_scale=1.0, use_pilot=False, chains=2, seed=None, return_journal=False)[source]¶

tune: bool: Flag for tuning. Defaults to True.
tune_interval: int: The frequency of tuning. Defaults to 100 iterations.

Due to multiprocessing, estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.

A good choice for the number of jobs is the number of cores or processors on your computer. If your processor supports hyperthreading, you can select an even higher number of jobs. The number of jobs is set to the number of cores found in the system by default.

There are some papers that suggest Metropolis-Hastings is most efficient when you accept 23.4% of proposed samples, and it turns out that lowering step size increases the probability of accepting a proposal. PyMC3 will spend the first 500 steps increasing and decreasing the step size to try to find the best value of sd that will give you an acceptance rate of 23.4% (you can even set different acceptance rates).

burn : either burn away or add to n_samples

journal()[source]¶: Create and return an instance of Journal class.

batches(n_samples, n_jobs, seed, force_equal=False)¶

Divide and conquer.

Divide the number of samples, n_samples, between n_jobs workers. If the ABC algorithm uses Markov chains, then n_jobs is the number chains. If n_jobs exceeds the number of available CPUs found by Pathos (this might include hardware threads), n_jobs is set to the number found by Pathos.

The method also creates a seed for each worker based on the input seed. This ensures reproducibility if wanted. If seed=None, the results will be stochastic between each run of the sampler.

When using Markov chains, the force_equal keyword must be set to True, in order to enforce equal length of chains. The n_samples that results in equal chain lengths will be found internally and returned. The corrected n_samples should be used downstream if needed.

Parameters

n_samplesint: Number of (posterior) samples to draw.
n_jobsint: Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those).
seedint: User-provided seed.
force_equalbool, optional: If True, n_samples will be adjusted to enforce equal number of tasks for each worker. Default: False.

Returns

n_samplesint: Possibly corrected number of samples (posterior draws).
n_jobsint: Possibly corrected number of processes (workers).
tasksint: The number of tasks for each parallel pool worker.
seedslist: Initial states for parallel pool workers.

distance(s_sim, s_obs, weight=1.0, scale=1.0)¶

Weighted Euclidean distance.

Computes the weighted Euclidean distance between two 1-D arrays of summary statistics. The weight parameter can be set to weight the importance of a summary statistic, whereas the scale parameter can be used to scale particular summary statistics in order to avoid dominance.

Parameters

s_sim{int, float}, array_like: Simulated summary statistic(s).
s_obs{int, float}, array_like: Observed summary statistic(s).
weight{int, float}, numpy.ndarray, optional: Importance weight(s) of summary statistic(s). Should sum to 1. Default: 1.0.
scale{int, float}, numpy.ndarray, optional: Scale weight(s) of summary statistic(s). Default: 1.0.

Returns

distancefloat: The weighted Euclidean distance between simulated and observed summary statistics.

property epsilon¶

Tolerance parameter.

Returns

float

pilot_study(n_sim=500, quantile=None, stat_scale=None, stat_weight=1.0, n_jobs=- 1, seed=None)¶

Perform pilot study.

The pilot study runs the simulator n_sim times and sets the threshold parameter epsilon automatically as the q-quantile of simulated distances from the prior predictive distribution. For instance, the 0.5-quantile (the median) will give a threshold that accepts 50% of the simulations.

The pilot study can also be used to provide an estimate of the stat_scale parameter, used in the weighted Euclidean distance, from the prior predictive distribution by passing the stat_scale keyword as sd or mad. The stat_scale parameter is used to avoid dominance of particular summary statistics. stat_scale=sd scales the summary statistics according to their standard deviation (SD) estimated from the prior predictive samples, and stat_scale=mad according to their median absolute deviation (MAD).

It is important to note that if more than 50% of the prior predictive samples for a particular summary statistic have identical values, MAD will equal zero. In this case, the logger will raise a warning and the scale for the particular summary statistic will be set to SD instead. If there is no variability at all, the scale will be set to 1.

It is recommended to check that there are not too many identical samples before setting the scale, to avoid surprises.

Parameters

n_simsint, optional: Number of simulator runs.
quantileint: Quantile of the Euclidean distances.
stat_scalestr, optional: Summary statistics scale to estimate; can be set as either sd (standard deviation) or mad (median absolute deviation). If None, scale is set to 1.0. Default: None.
stat_weight{int, float}, numpy.ndarray, optional: Importance weights of summary statistics. Default: 1.0.
n_jobsint, optional: Number of processes (workers). If n_jobs=-1, then n_jobs is set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those). Default: -1.
seedint: User-provided seed. Will be used to generate seed for each worker. Default: None.

property quantile¶

The chosen q-quantile.

Returns

float

reg_adjust(method='loclinear', transform=True, kernel='epkov', return_journal=False)¶

Post-sampling regression adjustment.

Regresses summary statistics on the obtained posterior samples, and corrects the posterior samples for the trend in the relationship.

Implementation based on Beaumont et al. (2002) and Blum (2017).

Parameters

methodstr, optional

The regression method to use:

linear: ordinary least squares regression;
loclinar: local linear regression (default).

transformbool, optional

If True, parameters are regressed on a logarithm scale. A log transformation will both stabilize the variance of the regression model and make it more homoscedastic. Note that the parameters must be positive in order to do a log transform. Default: True.

kernelstr, optional

The smoothing kernel function to use in local regression:

gaussian: Gaussian smoothing kernel;
epkov: Epanechnikov smoothing kernel.

return_journalbool, optional

If True, Journal is returned. Default: False.

References

M. Beaumont, W. Zhang and D.Balding. “Approximate Bayesian Computation in Population Genetics” GENETICS December 1, 2002 vol. 162 no. 4 2025-2035

M. Blum. “Regression approaches for approximate bayesian computation”. arXiv preprint arXiv:1707.01254, 2017 - arxiv.org

property stat_scale¶

The summary statistic scales.

Returns

numpy.ndarray

pylfi.inferences.RejABC

pylfi.priors.Prior