The first article in the series of mastering a subject in one document. The purpose of this series is to introduce an understanding of a whole topic as concisely and clearly as possible for easy review.
Update record:
2025.01.05 Completed version 1.0 (organized some important concepts and conclusions from the Princeton textbook on probability theory)
Prerequisite Knowledge#
Sum of an arithmetic series:
Sum of a geometric series:
Permutations:
Combinations:
Antiderivative:
- If , then F is called an antiderivative of f, or an (indefinite) integral of f.
- Antiderivatives are not unique; different antiderivatives of the same f differ by a constant.
Fundamental Theorem of Calculus: Let f be a piecewise continuous function, and F be any antiderivative of f. Then .
The area under the curve y=f(x) between x=a and x=b is equal to the value of the antiderivative of f at b minus the value of the antiderivative of f at a.
Taylor series: If f is n times differentiable, then the n-th Taylor series of f at point a is
The Taylor series at the origin (a=0) is also known as the Maclaurin series.
Basic Probability Theorems#
Conditional probability:
Independence: If events A and B satisfy , then A and B are independent.
Commutativity:
Total probability formula: If forms a partition of the sample space S (divided into at most countable parts), then for any , we have
Bayes' Theorem: Let be a partition of the sample space, then . (Based on conditional probability, the numerator is transformed using commutativity + conditional probability, and the denominator is the total probability formula.)
Random Variables#
Discrete Random Variables
- Probability density function (PDF):
- Cumulative distribution function (CDF):
Continuous Random Variables
- Let X be a random variable. If there exists a real-valued function such that: is a piecewise continuous function, , and , then X is a continuous random variable, and is the probability density function of X.
- Cumulative distribution function (CDF):
Expected Value: Let X be a random variable defined on R, with probability density function . The expected value of the function is
If , is called the r-th moment of X, and is called the r-th central moment of X.
(Why care about moments: The more Taylor coefficients you know, the better the approximation of the function, similar to understanding the properties of the shape of the probability density function.)
- The mean (average, expected value, denoted as ) of X is the first moment:
- The variance of X (denoted as or ) is the second central moment, the expected value of .
- The standard deviation is the square root of the variance,
Let X, Y, Z be continuous random variables,
Joint Probability Density Function:
The marginal probability density function of X:
Properties of Expectation:
- The expectation of a sum equals the sum of expectations:
- Let X be a random variable with mean and variance . Then the mean and variance of the random variable are: and
- Let X be a random variable, then
Properties of Mean and Variance:
- If X and Y are independent random variables, then , and
- Mean and Variance of the Sum of Random Variables: Let be n random variables with means and variances .
Let , then the mean of X is .
When random variables are independent, . - Covariance: .
The covariance of two independent random variables is 0, but a covariance of 0 does not imply independence (e.g., X is a symmetric distribution random variable with mean 0, and ).
If , then . - Correlation Coefficient (essentially a normalization of covariance): .
Covariance/correlation coefficient describes the linear relationship between two variables.
Special Distributions#
Name | Probability Density Function | Mean | Variance | Remarks |
---|---|---|---|---|
Bernoulli Distribution | p | p(1-p) | ||
Binomial Distribution | np | np(1-p) | Number of heads in n independent coin flips | |
Geometric Distribution | Number of trials until the first success | |||
Exponential Distribution | ||||
Normal Distribution |
Cumulative Distribution Method for Generating Random Numbers (Inverse Transform Method): Let X be a random variable with probability density function and cumulative distribution function . If Y is a random variable uniformly distributed over [0,1], then .
(Refer to: Rendering and Sampling (1): Inverse Transform Sampling—Principles and Practical Applications - ZUIcat's Article - Zhihu)
Hypothesis Testing#
Null Hypothesis: Usually the opposite of the conclusion you want to prove. Assume the null hypothesis is correct and try to use data to refute it.
Alternative Hypothesis: The conclusion you want to prove.
Z-test
- Let X be a normally distributed random variable with a known variance , and assume its mean is .
- Let be n independent observations drawn from this distribution, and let be the sample mean.
- The observed z-test statistic: . Follows a normal distribution with mean 0 and variance 1.
- Based on the probability of the z-test statistic deviating from 0 (p-value), if , then reject the null hypothesis. (p actually expresses the probability of observing the current sample data under the premise that the null hypothesis is true.)
- One-tailed test, two-tailed test: The difference lies in whether the parameter being measured is greater than (or less than) a certain value, or whether there is a significant difference from a certain value.
T-test
- If no information about the variance is known, then the sample variance needs to be calculated to estimate the variance:
- Compared to the conventional variance, the denominator here is n-1. (When there is only one sample, it is actually impossible to estimate the variance.)
- T-test statistic: , follows a t-distribution with n-1 degrees of freedom. (Correspondingly, the p-value needs to be calculated based on the t-distribution; the more degrees of freedom, the closer it approaches the normal distribution.)