Computational Biology Cheat Sheet

The core ideas of Computational Biology distilled into a single, scannable reference — perfect for review or quick lookup.

PiqCue — piqcue.com/computational-biology/cheatsheet

Quick Reference

Sequence Alignment

The process of arranging DNA, RNA, or protein sequences to identify regions of similarity that may indicate functional, structural, or evolutionary relationships. Global alignment (Needleman-Wunsch) aligns entire sequences end-to-end, while local alignment (Smith-Waterman) finds the most similar subsequences.

Phylogenetic Analysis

The computational reconstruction of evolutionary relationships among organisms or genes based on molecular data such as DNA or protein sequences. Methods include maximum likelihood, Bayesian inference, and neighbor-joining, producing tree-like diagrams (phylogenies) that depict ancestry and divergence.

Protein Structure Prediction

The computational determination of the three-dimensional structure of a protein from its amino acid sequence. This includes homology modeling, ab initio methods, and deep learning approaches like AlphaFold, which predict how a polypeptide chain folds into its functional conformation.

Genome Assembly

The computational process of reconstructing a complete genome sequence from shorter sequencing reads. De novo assembly builds genomes without a reference, while reference-guided assembly maps reads to an existing genome. Algorithms must handle millions to billions of overlapping fragments.

Gene Regulatory Networks

Mathematical and computational models describing how genes, transcription factors, and other molecules interact to control gene expression levels within a cell. These networks capture the logic of cellular decision-making and can be represented as Boolean networks, differential equations, or probabilistic graphical models.

Molecular Dynamics Simulation

A computational method that simulates the physical movements of atoms and molecules over time by numerically solving Newton's equations of motion. It is used to study protein folding, ligand binding, membrane dynamics, and other biomolecular processes at atomic resolution.

Machine Learning in Genomics

The application of machine learning algorithms, including deep neural networks, random forests, and support vector machines, to identify patterns in genomic data. Tasks include variant calling, gene expression prediction, regulatory element identification, and disease classification from omics data.

Systems Biology

An approach that studies biological systems holistically by integrating data from genomics, proteomics, metabolomics, and other high-throughput methods into computational models. It aims to understand emergent properties of biological systems that cannot be predicted from individual components alone.

Hidden Markov Models in Biology

A statistical model in which a system transitions between hidden states according to probabilistic rules, with each state emitting observable signals. In computational biology, HMMs are widely used for gene finding, protein domain identification, sequence alignment, and chromatin state annotation.

Single-Cell Transcriptomics Analysis

Computational methods for analyzing gene expression data measured at the resolution of individual cells. Techniques include dimensionality reduction (PCA, t-SNE, UMAP), clustering to identify cell types, trajectory inference to model differentiation, and differential expression testing between cell populations.

Key Terms at a Glance

Algorithm:A step-by-step procedure or formula for solving a computational problem, such as sequence alignment or genome assembly.

Bayesian Inference:A statistical method that updates probability estimates as new data becomes available, widely used in phylogenetics and gene expression analysis.

Bioinformatics:The science of collecting, storing, analyzing, and interpreting biological data, especially molecular sequences and genomic information.

BLAST:Basic Local Alignment Search Tool, a heuristic algorithm for rapidly finding regions of similarity between biological sequences.

Clustering:Grouping data points by similarity without predefined labels. Used in single-cell analysis, gene expression profiling, and protein family classification.

Contig:A contiguous stretch of DNA sequence assembled from overlapping shorter sequencing reads during genome assembly.

CRISPR:Clustered Regularly Interspaced Short Palindromic Repeats, a genome editing technology whose target design and off-target analysis rely heavily on computational tools.

De Bruijn Graph:A directed graph data structure used in genome assembly where nodes represent k-mers and edges represent overlaps between them.

Dynamic Programming:An algorithmic technique that solves problems by breaking them into overlapping subproblems, used in sequence alignment algorithms.

Expression Profile:A measurement of the activity (expression level) of thousands of genes simultaneously, used to identify patterns of gene regulation.

Flux Balance Analysis:A mathematical method for analyzing metabolic networks at steady state by optimizing an objective function subject to stoichiometric constraints.

Gene Ontology:A standardized vocabulary for describing gene and protein functions, organized into molecular function, biological process, and cellular component categories.

Genome Annotation:The process of identifying and labeling features in a genome sequence, including genes, regulatory elements, and repetitive sequences.

Homology:Similarity between sequences or structures due to shared evolutionary ancestry, used as the basis for functional prediction.

K-mer:A substring of length k derived from a biological sequence. K-mer frequency analysis is used in genome assembly, error correction, and metagenomics.

Get study tips in your inbox

We'll send you evidence-based study strategies and new cheat sheets as they're published.

We'll notify you about updates. No spam, unsubscribe anytime.