pylfi.inferences.ABCBase¶
- class pylfi.inferences.ABCBase(observation, simulator, stat_calc, priors, inference_scheme, log=True)[source]¶
Base class for Approximate Bayesian Computation algorithms.
- Parameters
- observationarray_like
The observed data \(y_\mathrm{obs}\).
- simulator
callable The simulator model parametrized by unknown model parameters \(\theta\). Any regular
Pythoncallable, i.e., function or class with__call__method can be used.- stat_calc
callable Summary statistics calculator. Must take the return from
simulatoras arguments. Any regularPythoncallable, i.e., function or class with__call__method can be used.- priors
listofPrior List containing the priors for the unknown model parameters. The order must be the same as the order of positional arguments in
simulator.- inference_scheme
str Name of inference scheme. To be passed by sub-class in call to base constructor.
- log
bool Whether logger should be displayed or not. Default:
True.
Attributes
Tolerance parameter.
The chosen q-quantile.
The summary statistic scales.
Methods
batches(n_samples, n_jobs, seed[, force_equal])Divide and conquer.
distance(s_sim, s_obs[, weight, scale])Weighted Euclidean distance.
journal()To be overwritten by sub-class: method to write and return
Journalinstance.pilot_study([n_sim, quantile, stat_scale, ...])Perform pilot study.
reg_adjust([method, transform, kernel, ...])Post-sampling regression adjustment.
sample()To be overwritten by sub-class: should implement sampling from inference scheme and optionally return
Journalvia the booleanreturn_journalkeyword argument.- abstract sample()[source]¶
To be overwritten by sub-class: should implement sampling from inference scheme and optionally return
Journalvia the booleanreturn_journalkeyword argument.The sample method needs specific attribute names. See
RejABCclass for an example.
- abstract journal()[source]¶
To be overwritten by sub-class: method to write and return
Journalinstance.See
RejABCclass for an example.- Returns
JournalJournal with results and information created by the run of inference schemes.
- distance(s_sim, s_obs, weight=1.0, scale=1.0)[source]¶
Weighted Euclidean distance.
Computes the weighted Euclidean distance between two 1-D arrays of summary statistics. The
weightparameter can be set to weight the importance of a summary statistic, whereas thescaleparameter can be used to scale particular summary statistics in order to avoid dominance.- Parameters
- s_sim{
int,float}, array_like Simulated summary statistic(s).
- s_obs{
int,float}, array_like Observed summary statistic(s).
- weight{
int,float},numpy.ndarray, optional Importance weight(s) of summary statistic(s). Should sum to 1. Default:
1.0.- scale{
int,float},numpy.ndarray, optional Scale weight(s) of summary statistic(s). Default:
1.0.
- s_sim{
- Returns
- distance
float The weighted Euclidean distance between simulated and observed summary statistics.
- distance
- batches(n_samples, n_jobs, seed, force_equal=False)[source]¶
Divide and conquer.
Divide the number of samples,
n_samples, betweenn_jobsworkers. If the ABC algorithm uses Markov chains, thenn_jobsis the number chains. Ifn_jobsexceeds the number of available CPUs found by Pathos (this might include hardware threads),n_jobsis set to the number found by Pathos.The method also creates a seed for each worker based on the input seed. This ensures reproducibility if wanted. If
seed=None, the results will be stochastic between each run of the sampler.When using Markov chains, the
force_equalkeyword must be set toTrue, in order to enforce equal length of chains. Then_samplesthat results in equal chain lengths will be found internally and returned. The correctedn_samplesshould be used downstream if needed.- Parameters
- n_samples
int Number of (posterior) samples to draw.
- n_jobs
int Number of processes (workers). If
n_jobs=-1, thenn_jobsis set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those).- seed
int User-provided seed.
- force_equal
bool, optional If
True,n_sampleswill be adjusted to enforce equal number of tasks for each worker. Default:False.
- n_samples
- Returns
- pilot_study(n_sim=500, quantile=None, stat_scale=None, stat_weight=1.0, n_jobs=- 1, seed=None)[source]¶
Perform pilot study.
The pilot study runs the simulator
n_simtimes and sets the threshold parameterepsilonautomatically as the q-quantile of simulated distances from the prior predictive distribution. For instance, the 0.5-quantile (the median) will give a threshold that accepts 50% of the simulations.The pilot study can also be used to provide an estimate of the
stat_scaleparameter, used in the weighted Euclidean distance, from the prior predictive distribution by passing thestat_scalekeyword assdormad. Thestat_scaleparameter is used to avoid dominance of particular summary statistics.stat_scale=sdscales the summary statistics according to their standard deviation (SD) estimated from the prior predictive samples, andstat_scale=madaccording to their median absolute deviation (MAD).It is important to note that if more than 50% of the prior predictive samples for a particular summary statistic have identical values, MAD will equal zero. In this case, the logger will raise a warning and the scale for the particular summary statistic will be set to SD instead. If there is no variability at all, the scale will be set to 1.
It is recommended to check that there are not too many identical samples before setting the scale, to avoid surprises.
- Parameters
- n_sims
int, optional Number of simulator runs.
- quantile
int Quantile of the Euclidean distances.
- stat_scale
str, optional Summary statistics scale to estimate; can be set as either
sd(standard deviation) ormad(median absolute deviation). IfNone, scale is set to1.0. Default:None.- stat_weight{
int,float},numpy.ndarray, optional Importance weights of summary statistics. Default:
1.0.- n_jobs
int, optional Number of processes (workers). If
n_jobs=-1, thenn_jobsis set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those). Default:-1.- seed
int User-provided seed. Will be used to generate seed for each worker. Default:
None.
- n_sims
- reg_adjust(method='loclinear', transform=True, kernel='epkov', return_journal=False)[source]¶
Post-sampling regression adjustment.
Regresses summary statistics on the obtained posterior samples, and corrects the posterior samples for the trend in the relationship.
Implementation based on Beaumont et al. (2002) and Blum (2017).
- Parameters
- method
str, optional - The regression method to use:
linear: ordinary least squares regression;loclinar: local linear regression (default).
- transform
bool, optional If
True, parameters are regressed on a logarithm scale. A log transformation will both stabilize the variance of the regression model and make it more homoscedastic. Note that the parameters must be positive in order to do a log transform. Default:True.- kernel
str, optional - The smoothing kernel function to use in local regression:
gaussian: Gaussian smoothing kernel;epkov: Epanechnikov smoothing kernel.
- return_journal
bool, optional If
True,Journalis returned. Default:False.
- method
References
M. Beaumont, W. Zhang and D.Balding. “Approximate Bayesian Computation in Population Genetics” GENETICS December 1, 2002 vol. 162 no. 4 2025-2035
M. Blum. “Regression approaches for approximate bayesian computation”. arXiv preprint arXiv:1707.01254, 2017 - arxiv.org
- property stat_scale¶
The summary statistic scales.
- Returns