STATS 101 Cheatsheet - Intro to Statistics — sampling, inference, regression

STATS 101 · Intro to Statistics — sampling, inference, regression

Midterm & Final Reference · Ultra-Dense A4
Generated by AskSia.ai — graphs, formulas, traps

① PROBABILITY FUNDAMENTALS ↗ TAP

Foundations

Probability: P(A) ∈ [0,1]. Sum of all mutually exclusive outcomes = 1.

P(A or B) = P(A) + P(B) − P(A and B)
P(A and B) = P(A) · P(B|A)
P(A|B) = P(A and B) / P(B)

Concept	Formula	Cue
Independent	P(A∩B) = P(A)·P(B)	knowing one tells nothing about other
Mutually excl.	P(A∩B) = 0	can't both happen
Conditional	P(A\|B) = P(A∩B)/P(B)	'given that B'

Bayes' theorem

P(A|B) = P(B|A)·P(A) / P(B). Reverses conditional. Used heavily in disease testing, spam filters.

Counting

Permutations nPr = n!/(n−r)!. Combinations nCr = n!/[r!(n−r)!]. Order matters? P. Doesn't? C.

⚡ EXAM TRAP — INDEPENDENT vs MUTUALLY EXCLUSIVE

These are opposite, not the same. Mutually exclusive events are maximally dependent — if one happens, the other definitely doesn't. Two distinct outcomes can't both be 'independent' AND 'mutually exclusive' (unless one has P=0).

⑥ CHI-SQUARE & ANOVA ↗ TAP

Chi-square (χ²)

Tests if categorical data matches expected distribution.

χ² = Σ (Observed − Expected)² / Expected

Test	Use when	df
Goodness of fit	1 categorical var matches a distribution	k−1
Independence	2 categorical vars unrelated	(r−1)(c−1)
Homogeneity	Same distribution across groups	(r−1)(c−1)

Big χ² → reject H₀ → data doesn't match expected.

ANOVA (F-test)

Tests if 3+ group means are equal (vs at least one different).

F = MS_between / MS_within
F large → between-group variance dominates → reject H₀

One-way ANOVA

One factor, multiple levels. e.g. 'do 4 teaching methods give different test scores?'. df: between = k−1, within = N−k.

Post-hoc

ANOVA tells you SOMETHING differs but not WHAT. Use Tukey's HSD or Bonferroni after rejection to pinpoint.

⚡ EXAM TRAP — RUNNING MULTIPLE t-TESTS

Don't run 6 pairwise t-tests on 4 groups — that's family-wise error rate explosion. With α=0.05 each, your overall false-positive rate balloons. Use ANOVA first, then post-hoc tests.

② RANDOM VARIABLES & DISTRIBUTIONS ↗ TAP

Discrete vs continuous

Discrete (PMF)	Continuous (PDF)
P(X=k) ≥ 0	f(x) ≥ 0
Σ P(X=k) = 1	∫ f(x)dx = 1
E[X] = Σ k·P(X=k)	E[X] = ∫ x·f(x)dx

Var(X) = E[X²] − (E[X])² · SD(X) = √Var(X)

Common distributions

Distribution	Mean	Variance	Use
Bernoulli(p)	p	p(1−p)	1 trial, success/fail
Binomial(n,p)	np	np(1−p)	n indep. trials
Poisson(λ)	λ	λ	rare events / time
Normal(μ,σ²)	μ	σ²	continuous, bell shape
Uniform(a,b)	(a+b)/2	(b−a)²/12	equal density on [a,b]
Exponential(λ)	1/λ	1/λ²	waiting time

Z-score

z = (x − μ)/σ. Number of SDs from mean. Used to standardize and look up tail probabilities.

68-95-99.7 rule

For Normal: ~68% within 1σ, 95% within 2σ, 99.7% within 3σ. Memorize — exam shortcuts depend on it.

⚡ EXAM TRAP — P(X = x) FOR CONTINUOUS

For continuous random variables, P(X = x) = 0 always. Probability lives in intervals, not points. P(X < 5) and P(X ≤ 5) are equal — the boundary contributes nothing.

⑤ HYPOTHESIS TESTING ↗ TAP

The 5-step recipe

▼ HYPOTHESIS TEST PROCEDURE

1. State H₀ (null) and H_a (alt)

2. Choose α (usually 0.05)

3. Compute test statistic (z, t, χ², F)

4. Find p-value or compare to critical value

5. p < α → reject H₀; p ≥ α → fail to reject

Test	Use when	Statistic
z-test	σ known, large n	(x̄−μ₀)/(σ/√n)
t-test (1-sample)	σ unknown, use s	(x̄−μ₀)/(s/√n)
2-sample t	compare 2 means	(x̄₁−x̄₂)/SE
Paired t	before/after on same units	d̄/(s_d/√n)
Proportion z	comparing p̂ to p₀	(p̂−p₀)/√(p₀(1−p₀)/n)

Type I error (α)

Reject H₀ when it's true. False alarm. We control this with α.

Type II error (β)

Fail to reject when H₀ false. Miss. Power = 1−β. Increase by larger n or bigger effect.

p-value: probability of seeing data this extreme or more, IF H₀ were true. Small p = data unlikely under H₀ = evidence against H₀.

⚡ EXAM TRAP — p-VALUE MISINTERPRETATION

p = 0.03 does NOT mean 'H₀ has 3% probability of being true.' It means: IF H₀ were true, we'd see data this extreme 3% of the time. p-value is about data, not hypotheses.

⑦ REGRESSION ↗ TAP

Simple linear regression

ŷ = b₀ + b₁·x
b₁ = r·(s_y/s_x)   ·   b₀ = ȳ − b₁·x̄

Slope b₁: predicted change in y per 1-unit change in x. r is correlation coefficient ∈ [−1, 1].

Quantity	Meaning	Range
r	linear correlation	[−1, 1]
R² = r²	% variance explained	[0, 1]
residual	y − ŷ (vertical gap)	any real
SE(b₁)	uncertainty in slope	≥ 0

Inference on slope

t = b₁ / SE(b₁) · df = n − 2

H₀: β₁ = 0 (no linear relation). Reject if |t| > critical.

R² interpretation

R²=0.7 means 70% of variation in y is explained by x. Higher = better fit. Doesn't imply causation.

Residual plots

Random scatter around 0 = good. Pattern (curve, fan) = model misspecified, transform variables.

⚡ EXAM TRAP — CORRELATION ≠ CAUSATION

r = 0.9 between ice cream sales and drownings doesn't mean ice cream causes drowning. Hot weather causes both. Always consider lurking variables and direction.

③ SAMPLING & CENTRAL LIMIT THEOREM ↗ TAP

Sampling distribution

The distribution of a statistic (like x̄) computed from many samples of size n.

For sample mean x̄:
E[x̄] = μ   ·   SD(x̄) = σ/√n   (Standard Error)

Central Limit Theorem (CLT)

For any distribution with finite μ, σ:

x̄ → Normal(μ, σ²/n) as n → ∞

Rule of thumb: n ≥ 30 and CLT kicks in even for skewed parents. For symmetric parents, much smaller n works.

Why √n matters

To halve the SE, you need 4× the sample size. Precision scales with √n, not n. Expensive.

Population vs sample

μ, σ = population (parameters). x̄, s = sample (statistics). Use lowercase for what you actually have.

Proportions: p̂ ~ Normal(p, p(1−p)/n) when np ≥ 10 AND n(1−p) ≥ 10.

⚡ EXAM TRAP — CLT SAYS x̄ NORMAL, NOT X

CLT is about the sample mean distribution, not the original variable. If X is heavily skewed, X stays skewed forever. Only x̄ becomes Normal as n grows.

④ CONFIDENCE INTERVALS ↗ TAP

The CI formula

CI = point estimate ± (critical value) × (standard error)

Parameter	CI Form	When
μ (σ known)	x̄ ± z*·(σ/√n)	rare, only if σ given
μ (σ unknown)	x̄ ± t*·(s/√n)	standard, df = n−1
p (proportion)	p̂ ± z*·√(p̂(1−p̂)/n)	np̂ ≥ 10, n(1−p̂) ≥ 10
μ₁−μ₂ (2-sample)	(x̄₁−x̄₂) ± t*·SE	compare 2 means

Common critical values

90% CI → z* = 1.645 · 95% → 1.96 · 99% → 2.576

What 95% CI means

If we repeated this sampling process many times, ~95% of the resulting intervals would contain μ. Not a probability about THIS interval — it either contains μ or doesn't.

Sample size for given margin

n = (z*·σ / E)² for given margin of error E. Want half the margin? Need 4× the sample.

⚡ EXAM TRAP — 'PROBABILITY MU IS HERE' INTERPRETATION

'There's a 95% chance μ is in (3.2, 4.8)' is wrong. μ is fixed; the interval is random. Better: 'we used a procedure that captures μ 95% of the time.'

⑧ DECISION BOX — pick the test ↗ TAP

Read the question. Find the trigger.

If you see…	Use §
'P(A and B)', 'and / or'	§1 probability rules
'P(A \| B)', 'given that'	§1 conditional / Bayes
'mean of', 'expected value'	§2 distributions
'68/95/99.7', z-score	§2 Normal
'sample mean', large n	§3 CLT
'estimate μ with margin'	§4 confidence interval
'is the mean equal to'	§5 t-test or z-test
2 group comparison	§5 2-sample t-test
before/after same subjects	§5 paired t-test
'is proportion equal to'	§5 proportion z-test
3+ group means	§6 ANOVA
categorical data, 'fits a distribution'	§6 chi-square goodness
2 categorical vars, 'related'	§6 chi-square independence
'predict y from x'	§7 regression

▼ LAST-MINUTE PROCEDURE

Match the data type: categorical → χ² or proportion. Numerical → t/z/ANOVA.

Match the question: 'estimate' → CI. 'is it equal' → hypothesis test. 'predict' → regression.

Match the # groups: 1 group → 1-sample. 2 groups → 2-sample. 3+ groups → ANOVA.

Always check assumptions: Normal? Independent? Sample size adequate?

⚡ EXAM TRAP — IGNORING ASSUMPTIONS

Every test has assumptions (normality, independence, equal variance, etc.). Apply a t-test to grossly skewed data with n=8 and you're producing nonsense, not statistics. Always state and check.

⚡ FINAL EXAM TRAP — STATISTICAL vs PRACTICAL SIGNIFICANCE

With huge n, even tiny effects become 'statistically significant' (p < 0.05). Always also report effect size — does the difference matter in real life?

Saved your GPA? Send this to your study group.

Want one for YOUR exact syllabus?