Computational Statistics Cheat Sheet
The core ideas of Computational Statistics distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Bootstrap
A resampling technique introduced by Bradley Efron in 1979 that estimates the sampling distribution of a statistic by repeatedly drawing samples with replacement from the observed data, enabling inference without strong parametric assumptions.
Markov Chain Monte Carlo (MCMC)
A class of algorithms that draw samples from a probability distribution by constructing a Markov chain whose stationary distribution is the target distribution. Common variants include the Metropolis-Hastings algorithm and Gibbs sampling.
Monte Carlo Simulation
A broad class of computational methods that use repeated random sampling to obtain numerical results, typically to estimate quantities that are difficult or impossible to compute analytically.
Expectation-Maximization (EM) Algorithm
An iterative optimization algorithm for finding maximum likelihood estimates when data is incomplete or has latent variables. It alternates between computing expected values of the latent variables (E-step) and maximizing the likelihood given those expectations (M-step).
Kernel Density Estimation (KDE)
A non-parametric method for estimating the probability density function of a random variable by placing a smooth kernel function at each data point and summing the contributions, with a bandwidth parameter controlling the degree of smoothing.
Permutation Test
A non-parametric hypothesis test that determines the p-value by computing the test statistic for every possible (or a large random subset of) reassignment of observations to groups, providing an exact or approximate test without distributional assumptions.
Cross-Validation
A model assessment technique that partitions data into complementary subsets, using one subset for training and the other for validation, then rotating the partition to reduce bias in performance estimation. K-fold cross-validation is the most common variant.
Gibbs Sampling
A specific MCMC algorithm that generates samples from a multivariate distribution by iteratively sampling each variable from its conditional distribution given the current values of all other variables.
Variational Inference
An optimization-based approach to approximate Bayesian inference that reformulates the problem of computing the posterior distribution as an optimization problem, minimizing the Kullback-Leibler divergence between an approximate and the true posterior.
Importance Sampling
A Monte Carlo technique for estimating properties of a target distribution by drawing samples from a different, easier-to-sample proposal distribution and weighting each sample by the ratio of the target density to the proposal density.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.