Information theory is a mathematical framework for quantifying, storing, and communicating information, originally developed by Claude Shannon in his landmark 1948 paper 'A Mathematical Theory of Communication.' At its core, the theory provides precise definitions for intuitive concepts like information content, uncertainty, and redundancy using the language of probability and logarithms. Shannon demonstrated that information can be measured in bits, established fundamental limits on data compression (the source coding theorem), and proved that reliable communication is possible over noisy channels up to a calculable maximum rate (the channel capacity theorem). These results laid the theoretical groundwork for the entire digital age.
The central quantity in information theory is entropy, which measures the average uncertainty or surprise associated with a random variable's outcomes. High entropy indicates greater unpredictability and thus more information content per message, while low entropy signals redundancy and predictability. Related measures such as mutual information, relative entropy (Kullback-Leibler divergence), and conditional entropy allow researchers to quantify how much one variable reveals about another, how different two probability distributions are, and how much uncertainty remains after partial observation. These tools have proven indispensable not only in communications engineering but also in statistics, machine learning, neuroscience, and physics.
Beyond its origins in electrical engineering, information theory has become a unifying language across the sciences. In machine learning, cross-entropy and KL divergence are standard loss functions for training classification models and variational autoencoders. In biology, information-theoretic measures help quantify genetic diversity and neural coding efficiency. In physics, the Bekenstein-Hawking entropy connects information to black hole thermodynamics, while quantum information theory extends Shannon's framework to qubits and entanglement. Whether one is designing a 5G cellular network, compressing video, training a large language model, or studying the thermodynamics of computation, information theory provides the essential quantitative toolkit.