Computational Biology Cheat Sheet
The core ideas of Computational Biology distilled into a single, scannable reference — perfect for review or quick lookup.
Quick Reference
Sequence Alignment
The process of arranging DNA, RNA, or protein sequences to identify regions of similarity that may indicate functional, structural, or evolutionary relationships. Global alignment (Needleman-Wunsch) aligns entire sequences end-to-end, while local alignment (Smith-Waterman) finds the most similar subsequences.
Phylogenetic Analysis
The computational reconstruction of evolutionary relationships among organisms or genes based on molecular data such as DNA or protein sequences. Methods include maximum likelihood, Bayesian inference, and neighbor-joining, producing tree-like diagrams (phylogenies) that depict ancestry and divergence.
Protein Structure Prediction
The computational determination of the three-dimensional structure of a protein from its amino acid sequence. This includes homology modeling, ab initio methods, and deep learning approaches like AlphaFold, which predict how a polypeptide chain folds into its functional conformation.
Genome Assembly
The computational process of reconstructing a complete genome sequence from shorter sequencing reads. De novo assembly builds genomes without a reference, while reference-guided assembly maps reads to an existing genome. Algorithms must handle millions to billions of overlapping fragments.
Gene Regulatory Networks
Mathematical and computational models describing how genes, transcription factors, and other molecules interact to control gene expression levels within a cell. These networks capture the logic of cellular decision-making and can be represented as Boolean networks, differential equations, or probabilistic graphical models.
Molecular Dynamics Simulation
A computational method that simulates the physical movements of atoms and molecules over time by numerically solving Newton's equations of motion. It is used to study protein folding, ligand binding, membrane dynamics, and other biomolecular processes at atomic resolution.
Machine Learning in Genomics
The application of machine learning algorithms, including deep neural networks, random forests, and support vector machines, to identify patterns in genomic data. Tasks include variant calling, gene expression prediction, regulatory element identification, and disease classification from omics data.
Systems Biology
An approach that studies biological systems holistically by integrating data from genomics, proteomics, metabolomics, and other high-throughput methods into computational models. It aims to understand emergent properties of biological systems that cannot be predicted from individual components alone.
Hidden Markov Models in Biology
A statistical model in which a system transitions between hidden states according to probabilistic rules, with each state emitting observable signals. In computational biology, HMMs are widely used for gene finding, protein domain identification, sequence alignment, and chromatin state annotation.
Single-Cell Transcriptomics Analysis
Computational methods for analyzing gene expression data measured at the resolution of individual cells. Techniques include dimensionality reduction (PCA, t-SNE, UMAP), clustering to identify cell types, trajectory inference to model differentiation, and differential expression testing between cell populations.
Key Terms at a Glance
Get study tips in your inbox
We'll send you evidence-based study strategies and new cheat sheets as they're published.
We'll notify you about updates. No spam, unsubscribe anytime.