Bioinformatics Glossary

25 essential terms — because precise language is the foundation of clear thinking in Bioinformatics.

Showing 25 of 25 terms

The process of arranging sequences to identify regions of similarity, revealing functional, structural, or evolutionary relationships.

The process of identifying and labeling genes, regulatory elements, and other functional features in a genome.

An interdisciplinary field applying computational and statistical methods to analyze and interpret biological data.

Basic Local Alignment Search Tool; a widely used algorithm for comparing sequences against databases.

BLOcks Substitution Matrix; a series of scoring matrices for amino acid substitutions used in sequence alignment.

A contiguous stretch of DNA sequence assembled from overlapping shorter reads.

The average number of times each nucleotide in a genome is sequenced, affecting the reliability of variant detection.

A graph structure where nodes are k-mers and edges represent overlaps, used in genome assembly algorithms.

A statistically significant change in gene expression level between two or more experimental conditions.

A text-based format for representing nucleotide or amino acid sequences, consisting of a header line and sequence lines.

A file format storing both nucleotide sequences and per-base quality scores from sequencing instruments.

A standardized vocabulary describing gene products in terms of Biological Process, Molecular Function, and Cellular Component.

The complete set of genetic material (DNA) in an organism, including all genes and non-coding regions.

Similarity between sequences due to shared ancestry, including orthologs (speciation) and paralogs (duplication).

A contiguous subsequence of length k extracted from a longer sequence, used in assembly and analysis.

The study of genetic material recovered directly from environmental samples to characterize microbial communities.

A short, recurring sequence pattern with biological significance, such as a transcription factor binding site.

High-throughput sequencing technologies that produce millions of reads in parallel at reduced cost per base.

A homologous gene in different species that diverged through speciation and typically retains the same function.

A homologous gene within the same genome arising from a duplication event, which may acquire new functions.

The study of evolutionary relationships among organisms or genes using molecular sequence data.

The large-scale study of the entire set of proteins produced by an organism, including their structures and functions.

A short DNA sequence generated by a sequencing machine, typically ranging from 75 to 300 base pairs for short-read technologies.

The complete set of RNA transcripts produced by the genome under specific conditions or in a specific cell type.

A difference in DNA sequence compared to a reference genome, including SNPs, insertions, deletions, and structural changes.