Computational Statistics Cheat Sheet

The core ideas of Computational Statistics distilled into a single, scannable reference — perfect for review or quick lookup.

PiqCue — piqcue.com/computational-statistics/cheatsheet

Quick Reference

Bootstrap

A resampling technique introduced by Bradley Efron in 1979 that estimates the sampling distribution of a statistic by repeatedly drawing samples with replacement from the observed data, enabling inference without strong parametric assumptions.

Markov Chain Monte Carlo (MCMC)

A class of algorithms that draw samples from a probability distribution by constructing a Markov chain whose stationary distribution is the target distribution. Common variants include the Metropolis-Hastings algorithm and Gibbs sampling.

Monte Carlo Simulation

A broad class of computational methods that use repeated random sampling to obtain numerical results, typically to estimate quantities that are difficult or impossible to compute analytically.

Expectation-Maximization (EM) Algorithm

An iterative optimization algorithm for finding maximum likelihood estimates when data is incomplete or has latent variables. It alternates between computing expected values of the latent variables (E-step) and maximizing the likelihood given those expectations (M-step).

Kernel Density Estimation (KDE)

A non-parametric method for estimating the probability density function of a random variable by placing a smooth kernel function at each data point and summing the contributions, with a bandwidth parameter controlling the degree of smoothing.

Permutation Test

A non-parametric hypothesis test that determines the p-value by computing the test statistic for every possible (or a large random subset of) reassignment of observations to groups, providing an exact or approximate test without distributional assumptions.

Cross-Validation

A model assessment technique that partitions data into complementary subsets, using one subset for training and the other for validation, then rotating the partition to reduce bias in performance estimation. K-fold cross-validation is the most common variant.

Gibbs Sampling

A specific MCMC algorithm that generates samples from a multivariate distribution by iteratively sampling each variable from its conditional distribution given the current values of all other variables.

Variational Inference

An optimization-based approach to approximate Bayesian inference that reformulates the problem of computing the posterior distribution as an optimization problem, minimizing the Kullback-Leibler divergence between an approximate and the true posterior.

Importance Sampling

A Monte Carlo technique for estimating properties of a target distribution by drawing samples from a different, easier-to-sample proposal distribution and weighting each sample by the ratio of the target density to the proposal density.

Key Terms at a Glance

Approximate Bayesian Computation:A likelihood-free inference method that accepts parameter values whose simulated data closely match the observed data under chosen summary statistics.

Bandwidth:A smoothing parameter in kernel density estimation that controls the width of each kernel and the trade-off between bias and variance.

Bootstrap:A resampling method that draws repeated samples with replacement from observed data to estimate the sampling distribution of a statistic.

Burn-in:The initial phase of an MCMC run whose samples are discarded because the chain has not yet converged to the target distribution.

Convergence Diagnostics:Statistical tools used to assess whether an MCMC sampler has reached its stationary distribution, including trace plots, R-hat, and effective sample size.

Cross-Validation:A model evaluation strategy that repeatedly splits data into training and validation sets to estimate out-of-sample prediction performance.

EM Algorithm:An iterative procedure for maximum likelihood estimation in the presence of latent variables, alternating between expectation and maximization steps.

Ergodicity:A property of a Markov chain ensuring that time averages converge to ensemble averages, required for valid MCMC inference.

Gibbs Sampler:An MCMC algorithm that samples each variable from its full conditional distribution given the current values of all other variables.

Hamiltonian Monte Carlo:An MCMC method that uses the gradient of the target density to propose moves along Hamiltonian trajectories, reducing random-walk behavior.

Importance Sampling:A variance-reduction technique that draws from a proposal distribution and reweights samples to estimate expectations under a different target distribution.

Jackknife:A resampling technique that estimates bias and standard error by systematically omitting one observation at a time from the dataset.

Kernel Density Estimation:A non-parametric technique for estimating the probability density function of a random variable by summing kernel functions placed at each observation.

Kullback-Leibler Divergence:A measure of how one probability distribution differs from a reference distribution, used in variational inference to quantify approximation quality.

Markov Chain:A stochastic process in which the future state depends only on the current state and not on the sequence of preceding states.

Get study tips in your inbox

We'll send you evidence-based study strategies and new cheat sheets as they're published.

We'll notify you about updates. No spam, unsubscribe anytime.