Information Retrieval Cheat Sheet
The core ideas of Information Retrieval distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Inverted Index
A data structure that maps each term in a vocabulary to a list of documents (or positions within documents) where that term appears, enabling fast full-text search. It is the fundamental building block of most modern search engines.
TF-IDF (Term Frequency-Inverse Document Frequency)
A numerical statistic that reflects the importance of a term in a document relative to a collection. Term frequency measures how often a term appears in a document, while inverse document frequency reduces the weight of terms that appear in many documents.
Precision and Recall
Two fundamental evaluation metrics in IR. Precision is the fraction of retrieved documents that are relevant, while recall is the fraction of all relevant documents that are retrieved. Together they capture the trade-off between returning only relevant results and returning all relevant results.
Vector Space Model
A mathematical model for representing documents and queries as vectors in a high-dimensional space, where each dimension corresponds to a term. Relevance is computed as the similarity (often cosine similarity) between the query vector and document vectors.
Boolean Retrieval Model
The simplest retrieval model, which treats queries as Boolean expressions (AND, OR, NOT) and returns documents that exactly satisfy the logical conditions. It provides no ranking of results.
BM25 (Best Matching 25)
A probabilistic ranking function used to estimate the relevance of documents to a given query. It extends TF-IDF by incorporating document length normalization and term saturation, and is widely used as a strong baseline in modern search systems.
Relevance Feedback
A technique where the system uses user judgments on initially retrieved documents to refine the query and improve subsequent retrieval results. It can be explicit (user marks relevant documents) or implicit (inferred from click behavior).
Query Expansion
The process of automatically adding additional terms to a user's original query to improve retrieval effectiveness. Terms can be drawn from thesauri, user feedback, or co-occurrence statistics in the document collection.
PageRank
An algorithm developed by Larry Page and Sergey Brin that ranks web pages based on the structure of hyperlinks. A page receives a higher score if it is linked to by many pages, especially by pages that themselves have high PageRank scores.
Neural Information Retrieval
The application of deep learning and neural network models to information retrieval tasks, including learned dense representations (embeddings), cross-encoders for re-ranking, and end-to-end retrieval models that move beyond traditional term-matching approaches.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.