Statistics Tutorial: Important Statistics Formulas

This web page presents statistics formulas described in the Stat Trek tutorials. Each formula links to a web page that explains how to use the formula.

Parameters

Population mean = μ = ( Σ X_i ) / N
Population standard deviation = σ = sqrt [ Σ ( X_i - μ )² / N ]
Population variance = σ² = Σ ( X_i - μ )² / N
Variance of population proportion = σ_P² = PQ / n
Standardized score = Z = (X - μ) / σ
Population correlation coefficient =
ρ = [ 1 / N ] * Σ { [ (X_i - μ_X) / σ_x ] * [ (Y_i - μ_Y) / σ_y ] }

Statistics

Unless otherwise noted, these formulas assume simple random sampling.

Sample mean = x = ( Σ x_i ) / n
Sample standard deviation = s = sqrt [ Σ ( x_i - x )² / ( n - 1 ) ]
Sample variance = s² = Σ ( x_i - x )² / ( n - 1 )
Variance of sample proportion = s_p² = pq / (n - 1)
Pooled sample proportion = p = (p₁ * n₁ + p₂ * n₂) / (n₁ + n₂)
Pooled sample standard deviation =

s_p = sqrt [ (n₁ - 1) * s₁² + (n₂ - 1) * s₂² ] / (n₁ + n₂ - 2) ]
Sample correlation coefficient =
- r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] }

Correlation

Pearson product-moment correlation = r = Σ (xy) / sqrt [ ( Σ x² ) * ( Σ y² ) ]

Linear correlation (sample data) =
r = [ 1 / (n - 1) ] * Σ { [ (x_i - x) / s_x ] * [ (y_i - y) / s_y ] }

Linear correlation (population data) =

ρ = [ 1 / N ] * Σ { [ (X_i - μ_X) / σ_x ] * [ (Y_i - μ_Y) / σ_y ] }

Simple Linear Regression

Simple linear regression line: y = b₀ + b₁x
Regression coefficient = b₁ = Σ [ (x_i - x) (y_i - y) ] / Σ [ (x_i - x)²]
Regression slope intercept = b₀ = y - b₁ * x
Regression coefficient = b₁ = r * (s_y / s_x)
Standard error of regression slope =
s_b₁ = sqrt [ Σ(y_i - y_i)² / (n - 2) ] / sqrt [ Σ(x_i - x)² ]

Counting

n factorial: n! = n * (n-1) * (n - 2) * . . . * 3 * 2 * 1. By convention, 0! = 1.
Permutations of n things, taken r at a time: _nC_r = n! / (n - r)!
Combinations of n things, taken r at a time: _nC_r = n! / r!(n - r)! = _nP_r / r!

Probability

Rule of addition: P(A ∪ B) = P(A) + P(B) - P(A ∩ B)
Rule of multiplication: P(A ∩ B) = P(A) P(B|A)
Rule of subtraction: P(A') = 1 - P(A)

Random Variables

In the following formulas, X and Y are random variables, and a and b are constants.

Expected value of X = E(X) = μ_x = Σ [ x_i * P(x_i) ]
Variance of X =
Var(X) = σ² = Σ [ x_i - E(x) ]² * P(x_i) = Σ [ x_i - μ_x ]² * P(x_i)
Normal random variable = z-score = z = (X - μ)/σ
Chi-square statistic = Χ² = [ ( n - 1 ) * s² ] / σ²
f statistic = f = [ s₁²/σ₁² ] / [ s₂²/σ₂² ]
Expected value of sum of random variables =
E(X + Y) = E(X) + E(Y)
Expected value of difference between random variables =
E(X - Y) = E(X) - E(Y)
Variance of the sum of independent random variables =
Var(X + Y) = Var(X) + Var(Y)
Variance of the difference between independent random variables =
Var(X - Y) = E(X) + E(Y)

Sampling Distributions

Mean of sampling distribution of the mean = μ_x = μ
Mean of sampling distribution of the proportion = μ_p = P
Standard deviation of proportion = σ_p = sqrt[ P * (1 - P)/n ] = sqrt( PQ / n )
Standard deviation of the mean = σ_x = σ/sqrt(n)
Standard deviation of difference of sample means =
σ_d = sqrt[ (σ₁² / n₁) + (σ₂² / n₂) ]
Standard deviation of difference of sample proportions =
σ_d = sqrt{ [P₁(1 - P₁) / n₁] + [P₂(1 - P₂) / n₂] }

Standard Error

Standard error of proportion = SE_p = s_p = sqrt[ p * (1 - p)/n ] = sqrt( pq / n )
Standard error of difference for proportions =
SE_p = s_p = sqrt{ p * ( 1 - p ) * [ (1/n₁) + (1/n₂) ] }
Standard error of the mean = SE_x = s_x = s/sqrt(n)
Standard error of difference of sample means =
SE_d = s_d = sqrt[ (s₁² / n₁) + (s₂² / n₂) ]
Standard error of difference of paired sample means =
SE_d = s_d = { sqrt [ (Σ(d_i - d)² / (n - 1) ] } / sqrt(n)
Pooled sample standard error = s_pooled = sqrt [ (n₁ - 1) * s₁² + (n₂ - 1) * s₂² ] / (n₁ + n₂ - 2) ]
Standard error of difference of sample proportions =
s_d = sqrt{ [p₁(1 - p₁) / n₁] + [p₂(1 - p₂) / n₂] }

Discrete Probability Distributions

Binomial formula: P(X = x) = b(x; n, P) = _nC_x * P^x * (1 - P)^{n - x} = _nC_x * P^x * Q^{n - x}
Mean of binomial distribution = μ_x = n * P
Variance of binomial distribution = σ_x² = n * P * ( 1 - P )
Negative Binomial formula: P(X = x) = b*(x; r, P) = _x-1C_r-1 * P^r * (1 - P)^{x - r}
Mean of negative binomial distribution = μ_x = rQ / P
Variance of negative binomial distribution = σ_x² = r * Q / P²
Geometric formula: P(X = x) = g(x; P) = P * Q^{x - 1}
Mean of geometric distribution = μ_x = Q / P
Variance of geometric distribution = σ_x² = Q / P²
Hypergeometric formula: P(X = x) = h(x; N, n, k) = [ _kC_x ] [ _N-kC_n-x ] / [ _NC_n ]
Mean of hypergeometric distribution = μ_x = n * k / N
Variance of hypergeometric distribution = σ_x² = n * k * ( N - k ) * ( N - n ) / [ N² * ( N - 1 ) ]
Poisson formula: P(x; μ) = (e^-μ) (μ^x) / x!
Mean of Poisson distribution = μ_x = μ
Variance of Poisson distribution = σ_x² = μ
Multinomial formula: P = [ n! / ( n₁! * n₂! * ... n_k! ) ] * ( p₁^n₁ * p₂^n₂ * . . . * p_k^n_k )

Linear Transformations

For the following formulas, assume that Y is a linear transformation of the random variable X, defined by the equation: Y = aX + b.

Mean of a linear transformation = E(Y) = Y = aX + b.
Variance of a linear transformation = Var(Y) = a² * Var(X).
Standardized score = z = (x - μ_x) / σ_x.
t-score = t = (x - μ_x) / [ s/sqrt(n) ].

Estimation

Confidence interval: Sample statistic + Critical value * Standard error of statistic
Margin of error = (Critical value) * (Standard deviation of statistic)
Margin of error = (Critical value) * (Standard error of statistic)

Hypothesis Testing

Standardized test statistic = (Statistic - Parameter) / (Standard deviation of statistic)
One-sample z-test for proportions: z-score = z = (p - P₀) / sqrt( p * q / n )
Two-sample z-test for proportions: z-score = z = z = [ (p₁ - p₂) - d ] / SE
One-sample t-test for means: t-score = t = (x - μ) / SE
Two-sample t-test for means: t-score = t = [ (x₁ - x₂) - d ] / SE
Matched-sample t-test for means: t-score = t = [ (x₁ - x₂) - D ] / SE = (d - D) / SE
Chi-square test statistic = Χ² = Σ[ (Observed - Expected)² / Expected ]

Degrees of Freedom

The correct formula for degrees of freedom (DF) depends on the situation (the nature of the test statistic, the number of samples, underlying assumptions, etc.).

One-sample t-test: DF = n - 1
Two-sample t-test: DF = (s₁²/n₁ + s₂²/n₂)² / { [ (s₁² / n₁)² / (n₁ - 1) ] + [ (s₂² / n₂)² / (n₂ - 1) ] }
Two-sample t-test, pooled standard error: DF = n₁ + n₂ - 2
Simple linear regression, test slope: DF = n - 2
Chi-square goodness of fit test: DF = k - 1
Chi-square test for homogeneity: DF = (r - 1) * (c - 1)
Chi-square test for independence: DF = (r - 1) * (c - 1)

Sample Size

Below, the first two formulas find the smallest sample sizes required to achieve a fixed margin of error, using simple random sampling. The third formula assigns sample to strata, based on a proportionate design. The fourth formula, Neyman allocation, uses stratified sampling to minimize variance, given a fixed sample size. And the last formula, optimum allocation, uses stratified sampling to minimize variance, given a fixed budget.

Mean (simple random sampling): n = { z² * σ² * [ N / (N - 1) ] } / { ME² + [ z² * σ² / (N - 1) ] }
Proportion (simple random sampling): n = [ ( z² * p * q ) + ME² ] / [ ME² + z² * p * q / N ]
Proportionate stratified sampling: n_h = ( N_h / N ) * n
Neyman allocation (stratified sampling): n_h = n * ( N_h * σ_h ) / [ Σ ( N_i * σ_i ) ]
Optimum allocation (stratified sampling):
n_h = n * [ ( N_h * σ_h ) / sqrt( c_h ) ] / [ Σ ( N_i * σ_i ) / sqrt( c_i ) ]