pylfi.inferences.MCMCABC¶
- class pylfi.inferences.MCMCABC(observation, simulator, stat_calc, priors, log=True)[source]¶
Class implementing the MCMC ABC algorithm.
Attributes
Tolerance parameter.
The chosen q-quantile.
The summary statistic scales.
prop_scale
Methods
batches(n_samples, n_jobs, seed[, force_equal])Divide and conquer.
distance(s_sim, s_obs[, weight, scale])Weighted Euclidean distance.
journal()Create and return an instance of Journal class.
pilot_study([n_sim, quantile, stat_scale, ...])Perform pilot study.
reg_adjust([method, transform, kernel, ...])Post-sampling regression adjustment.
sample(n_samples[, epsilon, prop_scale, ...])tune: bool
tune([prop_scale, epsilon, tune_iter, ...])Tune the proposal scale
- tune(prop_scale=0.5, epsilon=None, tune_iter=500, tune_interval=100, stat_weight=1.0, stat_scale=1.0, seed=None, use_pilot=False)[source]¶
Tune the proposal scale
So how do we choose sd for the proposal distribution? There are some papers that suggest Metropolis-Hastings is most efficient when you accept 23.4% of proposed samples, and it turns out that lowering step size increases the probability of accepting a proposal. PyMC3 will spend the first 500 steps increasing and decreasing the step size to try to find the best value of sd that will give you an acceptance rate of 23.4% (you can even set different acceptance rates).
The problem is that if you change the step size while sampling, you lose the guarantees that your samples (asymptotically) come from the target distribution, so you should typically discard these. Also, there is typically a lot more adaptation going on in those first steps than just step_size.
- sample(n_samples, epsilon=None, prop_scale=0.5, burn=100, tune=True, tune_iter=500, tune_interval=100, stat_weight=1.0, stat_scale=1.0, use_pilot=False, chains=2, seed=None, return_journal=False)[source]¶
- tune: bool
Flag for tuning. Defaults to True.
- tune_interval: int
The frequency of tuning. Defaults to 100 iterations.
Due to multiprocessing, estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.
A good choice for the number of jobs is the number of cores or processors on your computer. If your processor supports hyperthreading, you can select an even higher number of jobs. The number of jobs is set to the number of cores found in the system by default.
There are some papers that suggest Metropolis-Hastings is most efficient when you accept 23.4% of proposed samples, and it turns out that lowering step size increases the probability of accepting a proposal. PyMC3 will spend the first 500 steps increasing and decreasing the step size to try to find the best value of sd that will give you an acceptance rate of 23.4% (you can even set different acceptance rates).
burn : either burn away or add to n_samples
- batches(n_samples, n_jobs, seed, force_equal=False)¶
Divide and conquer.
Divide the number of samples,
n_samples, betweenn_jobsworkers. If the ABC algorithm uses Markov chains, thenn_jobsis the number chains. Ifn_jobsexceeds the number of available CPUs found by Pathos (this might include hardware threads),n_jobsis set to the number found by Pathos.The method also creates a seed for each worker based on the input seed. This ensures reproducibility if wanted. If
seed=None, the results will be stochastic between each run of the sampler.When using Markov chains, the
force_equalkeyword must be set toTrue, in order to enforce equal length of chains. Then_samplesthat results in equal chain lengths will be found internally and returned. The correctedn_samplesshould be used downstream if needed.- Parameters
- n_samples
int Number of (posterior) samples to draw.
- n_jobs
int Number of processes (workers). If
n_jobs=-1, thenn_jobsis set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those).- seed
int User-provided seed.
- force_equal
bool, optional If
True,n_sampleswill be adjusted to enforce equal number of tasks for each worker. Default:False.
- n_samples
- Returns
- distance(s_sim, s_obs, weight=1.0, scale=1.0)¶
Weighted Euclidean distance.
Computes the weighted Euclidean distance between two 1-D arrays of summary statistics. The
weightparameter can be set to weight the importance of a summary statistic, whereas thescaleparameter can be used to scale particular summary statistics in order to avoid dominance.- Parameters
- s_sim{
int,float}, array_like Simulated summary statistic(s).
- s_obs{
int,float}, array_like Observed summary statistic(s).
- weight{
int,float},numpy.ndarray, optional Importance weight(s) of summary statistic(s). Should sum to 1. Default:
1.0.- scale{
int,float},numpy.ndarray, optional Scale weight(s) of summary statistic(s). Default:
1.0.
- s_sim{
- Returns
- distance
float The weighted Euclidean distance between simulated and observed summary statistics.
- distance
- pilot_study(n_sim=500, quantile=None, stat_scale=None, stat_weight=1.0, n_jobs=- 1, seed=None)¶
Perform pilot study.
The pilot study runs the simulator
n_simtimes and sets the threshold parameterepsilonautomatically as the q-quantile of simulated distances from the prior predictive distribution. For instance, the 0.5-quantile (the median) will give a threshold that accepts 50% of the simulations.The pilot study can also be used to provide an estimate of the
stat_scaleparameter, used in the weighted Euclidean distance, from the prior predictive distribution by passing thestat_scalekeyword assdormad. Thestat_scaleparameter is used to avoid dominance of particular summary statistics.stat_scale=sdscales the summary statistics according to their standard deviation (SD) estimated from the prior predictive samples, andstat_scale=madaccording to their median absolute deviation (MAD).It is important to note that if more than 50% of the prior predictive samples for a particular summary statistic have identical values, MAD will equal zero. In this case, the logger will raise a warning and the scale for the particular summary statistic will be set to SD instead. If there is no variability at all, the scale will be set to 1.
It is recommended to check that there are not too many identical samples before setting the scale, to avoid surprises.
- Parameters
- n_sims
int, optional Number of simulator runs.
- quantile
int Quantile of the Euclidean distances.
- stat_scale
str, optional Summary statistics scale to estimate; can be set as either
sd(standard deviation) ormad(median absolute deviation). IfNone, scale is set to1.0. Default:None.- stat_weight{
int,float},numpy.ndarray, optional Importance weights of summary statistics. Default:
1.0.- n_jobs
int, optional Number of processes (workers). If
n_jobs=-1, thenn_jobsis set to half of the CPUs found by Pathos (we assume half of the CPUs are hardware threads only and ignore those). Default:-1.- seed
int User-provided seed. Will be used to generate seed for each worker. Default:
None.
- n_sims
- reg_adjust(method='loclinear', transform=True, kernel='epkov', return_journal=False)¶
Post-sampling regression adjustment.
Regresses summary statistics on the obtained posterior samples, and corrects the posterior samples for the trend in the relationship.
Implementation based on Beaumont et al. (2002) and Blum (2017).
- Parameters
- method
str, optional - The regression method to use:
linear: ordinary least squares regression;loclinar: local linear regression (default).
- transform
bool, optional If
True, parameters are regressed on a logarithm scale. A log transformation will both stabilize the variance of the regression model and make it more homoscedastic. Note that the parameters must be positive in order to do a log transform. Default:True.- kernel
str, optional - The smoothing kernel function to use in local regression:
gaussian: Gaussian smoothing kernel;epkov: Epanechnikov smoothing kernel.
- return_journal
bool, optional If
True,Journalis returned. Default:False.
- method
References
M. Beaumont, W. Zhang and D.Balding. “Approximate Bayesian Computation in Population Genetics” GENETICS December 1, 2002 vol. 162 no. 4 2025-2035
M. Blum. “Regression approaches for approximate bayesian computation”. arXiv preprint arXiv:1707.01254, 2017 - arxiv.org
- property stat_scale¶
The summary statistic scales.
- Returns