Generated by AskSia.ai — graphs, formulas, traps
Probability: P(A) ∈ [0,1]. Sum of all mutually exclusive outcomes = 1.
P(A or B) = P(A) + P(B) − P(A and B)
P(A and B) = P(A) · P(B|A)
P(A|B) = P(A and B) / P(B)| Concept | Formula | Cue |
|---|---|---|
| Independent | P(A∩B) = P(A)·P(B) | knowing one tells nothing about other |
| Mutually excl. | P(A∩B) = 0 | can't both happen |
| Conditional | P(A|B) = P(A∩B)/P(B) | 'given that B' |
These are opposite, not the same. Mutually exclusive events are maximally dependent — if one happens, the other definitely doesn't. Two distinct outcomes can't both be 'independent' AND 'mutually exclusive' (unless one has P=0).
Tests if categorical data matches expected distribution.
χ² = Σ (Observed − Expected)² / Expected| Test | Use when | df |
|---|---|---|
| Goodness of fit | 1 categorical var matches a distribution | k−1 |
| Independence | 2 categorical vars unrelated | (r−1)(c−1) |
| Homogeneity | Same distribution across groups | (r−1)(c−1) |
Big χ² → reject H₀ → data doesn't match expected.
Tests if 3+ group means are equal (vs at least one different).
F = MS_between / MS_within
F large → between-group variance dominates → reject H₀Don't run 6 pairwise t-tests on 4 groups — that's family-wise error rate explosion. With α=0.05 each, your overall false-positive rate balloons. Use ANOVA first, then post-hoc tests.
| Discrete (PMF) | Continuous (PDF) |
|---|---|
| P(X=k) ≥ 0 | f(x) ≥ 0 |
| Σ P(X=k) = 1 | ∫ f(x)dx = 1 |
| E[X] = Σ k·P(X=k) | E[X] = ∫ x·f(x)dx |
Var(X) = E[X²] − (E[X])² · SD(X) = √Var(X)| Distribution | Mean | Variance | Use |
|---|---|---|---|
| Bernoulli(p) | p | p(1−p) | 1 trial, success/fail |
| Binomial(n,p) | np | np(1−p) | n indep. trials |
| Poisson(λ) | λ | λ | rare events / time |
| Normal(μ,σ²) | μ | σ² | continuous, bell shape |
| Uniform(a,b) | (a+b)/2 | (b−a)²/12 | equal density on [a,b] |
| Exponential(λ) | 1/λ | 1/λ² | waiting time |
For continuous random variables, P(X = x) = 0 always. Probability lives in intervals, not points. P(X < 5) and P(X ≤ 5) are equal — the boundary contributes nothing.
1. State H₀ (null) and H_a (alt)
2. Choose α (usually 0.05)
3. Compute test statistic (z, t, χ², F)
4. Find p-value or compare to critical value
5. p < α → reject H₀; p ≥ α → fail to reject
| Test | Use when | Statistic |
|---|---|---|
| z-test | σ known, large n | (x̄−μ₀)/(σ/√n) |
| t-test (1-sample) | σ unknown, use s | (x̄−μ₀)/(s/√n) |
| 2-sample t | compare 2 means | (x̄₁−x̄₂)/SE |
| Paired t | before/after on same units | d̄/(s_d/√n) |
| Proportion z | comparing p̂ to p₀ | (p̂−p₀)/√(p₀(1−p₀)/n) |
p-value: probability of seeing data this extreme or more, IF H₀ were true. Small p = data unlikely under H₀ = evidence against H₀.
p = 0.03 does NOT mean 'H₀ has 3% probability of being true.' It means: IF H₀ were true, we'd see data this extreme 3% of the time. p-value is about data, not hypotheses.
ŷ = b₀ + b₁·x
b₁ = r·(s_y/s_x) · b₀ = ȳ − b₁·x̄Slope b₁: predicted change in y per 1-unit change in x. r is correlation coefficient ∈ [−1, 1].
| Quantity | Meaning | Range |
|---|---|---|
| r | linear correlation | [−1, 1] |
| R² = r² | % variance explained | [0, 1] |
| residual | y − ŷ (vertical gap) | any real |
| SE(b₁) | uncertainty in slope | ≥ 0 |
t = b₁ / SE(b₁) · df = n − 2H₀: β₁ = 0 (no linear relation). Reject if |t| > critical.
r = 0.9 between ice cream sales and drownings doesn't mean ice cream causes drowning. Hot weather causes both. Always consider lurking variables and direction.
The distribution of a statistic (like x̄) computed from many samples of size n.
For sample mean x̄:
E[x̄] = μ · SD(x̄) = σ/√n (Standard Error)For any distribution with finite μ, σ:
x̄ → Normal(μ, σ²/n) as n → ∞Rule of thumb: n ≥ 30 and CLT kicks in even for skewed parents. For symmetric parents, much smaller n works.
Proportions: p̂ ~ Normal(p, p(1−p)/n) when np ≥ 10 AND n(1−p) ≥ 10.
CLT is about the sample mean distribution, not the original variable. If X is heavily skewed, X stays skewed forever. Only x̄ becomes Normal as n grows.
CI = point estimate ± (critical value) × (standard error)| Parameter | CI Form | When |
|---|---|---|
| μ (σ known) | x̄ ± z*·(σ/√n) | rare, only if σ given |
| μ (σ unknown) | x̄ ± t*·(s/√n) | standard, df = n−1 |
| p (proportion) | p̂ ± z*·√(p̂(1−p̂)/n) | np̂ ≥ 10, n(1−p̂) ≥ 10 |
| μ₁−μ₂ (2-sample) | (x̄₁−x̄₂) ± t*·SE | compare 2 means |
90% CI → z* = 1.645 · 95% → 1.96 · 99% → 2.576'There's a 95% chance μ is in (3.2, 4.8)' is wrong. μ is fixed; the interval is random. Better: 'we used a procedure that captures μ 95% of the time.'
| If you see… | Use § |
|---|---|
| 'P(A and B)', 'and / or' | §1 probability rules |
| 'P(A | B)', 'given that' | §1 conditional / Bayes |
| 'mean of', 'expected value' | §2 distributions |
| '68/95/99.7', z-score | §2 Normal |
| 'sample mean', large n | §3 CLT |
| 'estimate μ with margin' | §4 confidence interval |
| 'is the mean equal to' | §5 t-test or z-test |
| 2 group comparison | §5 2-sample t-test |
| before/after same subjects | §5 paired t-test |
| 'is proportion equal to' | §5 proportion z-test |
| 3+ group means | §6 ANOVA |
| categorical data, 'fits a distribution' | §6 chi-square goodness |
| 2 categorical vars, 'related' | §6 chi-square independence |
| 'predict y from x' | §7 regression |
Match the data type: categorical → χ² or proportion. Numerical → t/z/ANOVA.
Match the question: 'estimate' → CI. 'is it equal' → hypothesis test. 'predict' → regression.
Match the # groups: 1 group → 1-sample. 2 groups → 2-sample. 3+ groups → ANOVA.
Always check assumptions: Normal? Independent? Sample size adequate?
Every test has assumptions (normality, independence, equal variance, etc.). Apply a t-test to grossly skewed data with n=8 and you're producing nonsense, not statistics. Always state and check.
With huge n, even tiny effects become 'statistically significant' (p < 0.05). Always also report effect size — does the difference matter in real life?