[Lecture Notes] CMU 14757 Intro to ML with Adversaries in Mind

Lec 1. Intro

Mean, Std and Var

  • Mean (np.mean())

    \mathrm{mean}(\{x_i\}) = \frac{1}{N} \sum_{i=1}^N x_i$

  • Standard deviation (np.std())

    std({xi})=1N1i=1N(ximean({xi}))2\mathrm{std}(\{x_i\}) = \sqrt{\frac{1}{N-1}\sum_{i=1}^N(x_i - \mathrm{mean}(\{x_i\}) )^2}

  • Variance

    var({xi})=1N1i=1N(ximean({xi}))2=std({xi})2\mathrm{var}(\{x_i\}) = \frac{1}{N-1}\sum_{i=1}^N(x_i - \mathrm{mean}(\{x_i\}) )^2 =\mathrm{std}(\{x_i\})^2

Standardizing data: 使平均数为0,标准差为1

  • 将dataset {x}\{x\} standardize 成 {x^}\{\hat{x}\}:

    xi^=ximean({x})std({x})\hat{x_i} = \frac{x_i - \mathrm{mean}(\{x\})}{\mathrm{std}(\{x\})}

Median: 50th percentile 中位数

Interquartile range: 中间50%的值范围,即 (75th percentile) - (25th percentile)

Correlation coefficient 相关系数

  • 给定数据集 {(x,y)}\{(x,y)\}, 先将 {x}\{x\}{y}\{y\} 分别标准化,则

    corr({(x,y})=1N1i=1Nxi^yi^\mathrm{corr}(\{(x,y\}) = \frac{1}{N-1} \sum_{i=1}^N \hat{x_i} \hat{y_i}

  • np.corrcoef(), pd.corr()

  • 相关系数取值范围为 [1,1][-1,1]

  • positive correlation, negative correlation, zero correlation

Lec 2. Probability

Outcome: a possible result of a random experiment

Sample space Ω\Omega: the set of all possible outcomes

Event: an event EE is a subset of the sample space Ω\Omega

Probability function: any function PP that maps events to real numbers and satisfies:

  • P(E)0P(E) \geq 0
  • P(Ω)=1P(\Omega) = 1
  • Probability of disjoint events is additive: P(E1E2EN)=i=1NP(Ei)P(E_1 \cup E_2 \cup \cdots \cup E_N) = \sum_{i=1}^N P(E_i) if EiEj=E_i \cap E_j = \empty for all iji\neq j

Independence: 当且仅当以下条件时,两个event独立

  • P(E1E2)=P(E1)P(E2)P(E_1 \cap E_2) = P(E_1) P(E_2)

  • 如果已知 E1E_1 发生,E2E_2 的概率不会改变

Conditional Probability 条件概率

  • P(E2E1)=P(E1E2)P(E1)P(E_2 | E_1) = \frac{P(E_1 \cap E_2)}{P(E_1)}

  • 如果两个event independent,则P(E2E1)=P(E2)P(E_2 | E_1) = P(E_2)

Bayes Rule 贝叶斯公式

P(E2E1)=P(E1E2)P(E2)P(E1)P(E_2|E_1) = \frac{P(E_1|E_2)P(E_2)}{P(E_1)}

Total Probability 全概率公式

P(E1)=P(E1E2)+P(E1E2c)=P(E1E2)P(E2)+P(E1E2c)P(E2c)P(E_1) = P(E_1 \cap E_2) + P(E_1 \cap E_2^c) = P(E_1|E_2)P(E_2) + P(E_1|E_2^c)P(E_2^c)

Conditional Independence 条件独立

P(E1E2A)=P(E1A)P(E2A)P(E_1 \cap E_2 | A) = P(E_1 | A) P(E_2 | A)

Random Variable: a random variable is a function that maps outcomes to real numbers.

Probability distribution: P(X=x)P(X=x) is called the probability distribution of XX. Also denoted as P(x)P(x) or p(x)p(x).

Joint probability distribution: P({X=x}{Y=y})P(\{X=x\} \cap \{Y=y\}), also denoted as P(x,y)P(x,y) or p(x,y)p(x,y)

Independence of random variables: 如果随机变量 XX, YY 满足以下条件,则独立: P(x,y)=P(x)P(y)P(x,y) = P(x)P(y) for all xx and yy

Conditional probability distribution:

P(xy)=P(x,y)P(y)P(x|y)= \frac{P(x,y)}{P(y)}

Bayes rule

P(xy)=P(yx)P(x)P(y)=P(yx)P(x)xP(yx)P(x)P(x|y)= \frac{P(y|x)P(x)}{P(y)} = \frac{P(y|x)P(x)}{\sum_x P(y|x)P(x)}

Expected value of a random variable 期望

E[X]=xxP(x)E[X] = \sum_x xP(x)

Variance of a random variable

var[X]=E[(XE[X])2]\mathrm{var}[X] = E[(X - E[X])^2]

Standard deviation of a random variable

std[X]=var[X]\mathrm{std}[X] = \sqrt{\mathrm{var}[X]}

Useful probability distributions

  • Bernoulli distribution

    • P(X=1)=pP(X=1) = p, P(X=0)=1pP(X=0) = 1-p
    • E[X]=pE[X] = p
    • var[X]=p(1p)\mathrm{var}[X] = p(1-p)
  • Dinomial distribution

    • P(X=k)=(Nk)pk(1p)NkP(X = k) = \binom{N}{k} p^k (1-p)^{N-k} for integer 0kN0 \leq k \leq N
    • E[X]=NpE[X] = Np
    • var[X]=Np(1p)\mathrm{var}[X] = Np(1-p)
  • Multinomial distribution

    • P(X1=n1,X2=n2,,Xk=nk)=N!n1!n2!nk!p1n1p2n2pknkP(X_1 = n_1, X_2 = n_2, \dots, X_k = n_k) = \frac{N!}{n_1! n_2! \dots n_k!} p_1^{n_1} p_2^{n_2} \dots p_k^{n_k}

      where N=n1+n2++nkN = n_1 + n_2 + \cdots + n_k

  • Poisson distribution

    • A discrete random variable XX is poisson with intensity λ\lambda if

      P(X=k)=eλλkk!P(X=k) = \frac{e^{-\lambda}\lambda^k}{k!}

      for integer k0k\geq 0

    • E[X]=λE[X] = \lambda

    • var[X]=λ\mathrm{var}[X] = \lambda

Lec 3. Classification and Naive Bayes

Binary classifier

Multiclass classifier

Nearest neighbors classifier

  • variants: k-nearest neighbors, (k,l)(k,l)-nearest neighbors (找k个最近的点的label,如果至少ll个同意则给label)

Performance of a binary classifier

  • false positive (truth is negative, but classifier assigns positive), false negative (the other way)
  • class confusion matrix: 2x2的矩阵,True Positive (TP), FN, FP, TN

Cross-validation: 分成训练集和测试集

Naive Bayes classifier: a probabilistic method:

  • Training: 使用训练数据 {(xi,yi)}\{(\mathbb{x}_i, y_i)\} 估计概率模型 P(yx)P(y|\mathbb{x})
  • Classification: 给定feature vector x\mathbb{x}, 预测 label = argmaxyP(yx)\arg \max_y P(y|\mathbb{x})
  • Naive bayes assumption: 给定 class label yyx\mathbb{x} 条件独立

Lec 4. Adversarial Spam Filtering


[Lecture Notes] CMU 14757 Intro to ML with Adversaries in Mind
https://www.billhu.us/2025/062_cmu_14757/
Author
Bill Hu
Posted on
September 9, 2025
Licensed under