Statistics — Distribution continuous, Outliers interquartile (extended) Glossary

25 essential terms — because precise language is the foundation of clear thinking in Statistics — Distribution continuous, Outliers interquartile (extended).

Showing 25 of 25 terms

The hypothesis that contradicts the null hypothesis, typically representing the researcher's claim that an effect, difference, or relationship exists in the population.

Analysis of Variance; a statistical method that tests whether the means of three or more groups are significantly different by comparing between-group and within-group variability using the F-statistic.

A method of statistical inference that uses Bayes' theorem to update the probability of a hypothesis as new data become available, combining prior beliefs with observed evidence.

A systematic error in data collection, analysis, or interpretation that causes results to deviate from the true population values. Common forms include selection bias, measurement bias, and confirmation bias.

A fundamental theorem stating that the distribution of sample means approximates a normal distribution as the sample size increases, regardless of the shape of the population distribution.

A non-parametric test used to assess the association between categorical variables or to compare observed frequencies with expected frequencies under a specified hypothesis.

A range of values, computed from sample data, that is expected to contain the true population parameter with a specified probability (e.g., 95%).

A numerical measure of the strength and direction of the linear relationship between two variables, most commonly the Pearson coefficient ($r$), which ranges from $-1$ to $+1$.

The number of independent values in a statistical calculation that are free to vary. Degrees of freedom affect the shape of test statistic distributions such as the t-distribution and chi-square distribution.

A quantitative measure of the magnitude of a phenomenon or the strength of a relationship. Common measures include Cohen's $d$, Pearson's $r$, and eta-squared. Unlike p-values, effect sizes convey practical significance.

A graphical representation of the distribution of continuous data, where the data are divided into bins and the height of each bar represents the frequency or relative frequency of observations in that bin.

A formal statistical procedure for making decisions about population parameters by evaluating sample evidence against a null hypothesis, using test statistics and p-values.

The difference between the third quartile (75th percentile) and the first quartile (25th percentile), representing the spread of the middle 50% of the data. It is resistant to outliers.

The arithmetic average of a set of values, calculated by summing all values and dividing by the number of observations. It is the most widely used measure of central tendency.

The middle value in an ordered dataset. For an even number of observations, the median is the average of the two central values. It is robust to extreme values and skewed data.

A condition in regression analysis where two or more independent variables are highly correlated, making it difficult to determine their individual effects on the dependent variable and inflating standard errors.

A symmetric, bell-shaped probability distribution parameterized by its mean $\mu$ and standard deviation $\sigma$. It is the most important distribution in statistics due to the Central Limit Theorem.

A statement of no effect, no difference, or no relationship, serving as the default assumption in hypothesis testing. It is denoted $H_0$ and is either rejected or not rejected based on the evidence.

A data point that lies significantly far from other observations in a dataset. Outliers may result from measurement error, data entry mistakes, or genuine extreme values, and they can heavily influence statistical results.

The probability of observing a test statistic as extreme as or more extreme than the one calculated from sample data, under the assumption that the null hypothesis is true.

A set of statistical methods for modeling the relationship between a dependent variable and one or more independent variables. Linear regression is the most common form, fitting a line that minimizes squared residuals.

The probability distribution of a statistic (such as the sample mean) obtained from all possible samples of a given size drawn from a population. It is central to understanding statistical inference.

A measure of dispersion that quantifies the average distance of data points from the mean. It is the square root of the variance and is expressed in the same units as the original data.

The error of rejecting a true null hypothesis (a false positive). The probability of committing a Type I error is denoted by $\alpha$ and is equal to the significance level of the test.

A measure of data dispersion calculated as the average of the squared deviations from the mean. Sample variance uses $N - 1$ in the denominator (Bessel's correction) to provide an unbiased estimate of the population variance.