Computational Biology Glossary

25 essential terms — because precise language is the foundation of clear thinking in Computational Biology.

Showing 25 of 25 terms

A step-by-step procedure or formula for solving a computational problem, such as sequence alignment or genome assembly.

A statistical method that updates probability estimates as new data becomes available, widely used in phylogenetics and gene expression analysis.

The science of collecting, storing, analyzing, and interpreting biological data, especially molecular sequences and genomic information.

Basic Local Alignment Search Tool, a heuristic algorithm for rapidly finding regions of similarity between biological sequences.

Grouping data points by similarity without predefined labels. Used in single-cell analysis, gene expression profiling, and protein family classification.

A contiguous stretch of DNA sequence assembled from overlapping shorter sequencing reads during genome assembly.

Clustered Regularly Interspaced Short Palindromic Repeats, a genome editing technology whose target design and off-target analysis rely heavily on computational tools.

A directed graph data structure used in genome assembly where nodes represent k-mers and edges represent overlaps between them.

An algorithmic technique that solves problems by breaking them into overlapping subproblems, used in sequence alignment algorithms.

A measurement of the activity (expression level) of thousands of genes simultaneously, used to identify patterns of gene regulation.

A mathematical method for analyzing metabolic networks at steady state by optimizing an objective function subject to stoichiometric constraints.

A standardized vocabulary for describing gene and protein functions, organized into molecular function, biological process, and cellular component categories.

The process of identifying and labeling features in a genome sequence, including genes, regulatory elements, and repetitive sequences.

Similarity between sequences or structures due to shared evolutionary ancestry, used as the basis for functional prediction.

A substring of length k derived from a biological sequence. K-mer frequency analysis is used in genome assembly, error correction, and metagenomics.

A statistical method that estimates parameters by finding values that maximize the probability of observing the given data, commonly used in phylogenetics.

The study of genetic material recovered directly from environmental samples, analyzing entire microbial communities without individual organism cultivation.

A short, recurring sequence pattern with biological significance, such as a transcription factor binding site in DNA or a conserved domain in proteins.

A gene in different species that evolved from a common ancestor through speciation and typically retains the same function.

A gene related to another within the same genome by duplication, which may evolve divergent functions over time.

The large-scale study of the entire set of proteins produced by an organism, including their structures, functions, and interactions.

The computational process of aligning short sequencing reads to a reference genome to determine their genomic origin.

A matrix (such as BLOSUM62 or PAM250) that assigns scores for aligning each pair of amino acids or nucleotides, reflecting evolutionary substitution rates.

Single Nucleotide Polymorphism, a variation at a single position in a DNA sequence among individuals, the most common type of genetic variation.

The study of the complete set of RNA transcripts produced by the genome under specific conditions, often measured by RNA-seq or microarrays.