Information Theory Cheat Sheet
The core ideas of Information Theory distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Shannon Entropy
A measure of the average uncertainty or information content in a random variable, defined as H(X) = -sum of p(x) log2 p(x) over all outcomes x. Higher entropy means greater unpredictability and more bits needed on average to encode each outcome.
Mutual Information
A measure of the amount of information that one random variable contains about another, quantifying the reduction in uncertainty about one variable given knowledge of the other. It is symmetric: I(X;Y) = I(Y;X).
Channel Capacity
The maximum rate at which information can be reliably transmitted over a communication channel, measured in bits per channel use. Shannon's noisy-channel coding theorem proves that error-free communication is achievable at any rate below channel capacity.
Data Compression (Source Coding)
The process of encoding information using fewer bits than the original representation. Shannon's source coding theorem establishes that the entropy of a source is the fundamental lower limit on the average number of bits per symbol achievable by any lossless compression scheme.
Kullback-Leibler Divergence
A non-symmetric measure of how one probability distribution P diverges from a reference distribution Q, defined as D_KL(P||Q) = sum of P(x) log(P(x)/Q(x)). It is always non-negative and equals zero only when the distributions are identical.
Error-Correcting Codes
Techniques for adding structured redundancy to transmitted data so that the receiver can detect and correct errors introduced by a noisy channel. Shannon proved that codes exist which achieve vanishing error probability at any rate below channel capacity.
Cross-Entropy
A measure from information theory that quantifies the average number of bits needed to encode data from distribution P when using a code optimized for distribution Q. It equals H(P) + D_KL(P||Q), combining true entropy with the divergence penalty.
Joint and Conditional Entropy
Joint entropy H(X,Y) measures the total uncertainty of two variables considered together, while conditional entropy H(Y|X) measures the remaining uncertainty in Y after observing X. The chain rule relates them: H(X,Y) = H(X) + H(Y|X).
Redundancy
The difference between the maximum possible entropy of a source (if all symbols were equally likely) and its actual entropy. Redundancy represents the exploitable structure in data that makes compression possible and also provides natural error resilience.
Noisy-Channel Coding Theorem
Shannon's foundational result proving that for any communication channel with capacity C, there exist coding schemes that allow transmission at rates up to C with arbitrarily small error probability, but no scheme can reliably exceed rate C.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.