juooo1117

[Data Science Math Skills: Week1] Sets, Numbers, Sigma Notation 본문

Artificial Intelligence/Data Science Math Skills

[Data Science Math Skills: Week1] Sets, Numbers, Sigma Notation

Hyo__ni 2024. 4. 12. 16:22

Sets

2 ∈ A : "2 is an element of A"

8 ∉ A : "8 is not an element of A"

 

Cardinality: The cardinality(size) of a set is the number of elements in it.

  |A| = 4  : "there are 4 elements in A, so the cardinality is 4"

  

 

Example using set theory

X = set of people in a clinical trial, VBS: very bad syndrome 이라고 가정할 때,

  S = {𝒳 ∈ X : 𝒳 has VBS}

  H = {𝒳 ∈ X : 𝒳 does not have VBS}

  (단, X = S ∪ H,  S ∩ H = ∅ 이라고 가정한다.)

 

위 내용에 test 개념 추가

  P = {𝒳 ∈ X : 𝒳 tests positive for VBS}

  N = {𝒳 ∈ X : 𝒳 tests negative for VBS}

  (단, X = P ∪ N,  P ∩ N = ∅ 이라고 가정한다.)

Cardinality

 

Ven Diagrams for visualization

Inclusion - Exclsusion formula : |A ∪ B| = |A| + |B| - |A ∩ B|

 ⇒ Cardinality of A union B (|A ∪ B|) equals cardinality of A (|A|) plus the cardinality of B (|B|) minus the cardinality of A intersect B (|A ∩ B|)

False negative, False positive 에 해당하는 비율이 존재하는 것 확인

Numbers

 

Some real numbers terminate, and some do not.

  𝝅  = 3.141592... is irrational number (it does not repeat after the decimal point!)

 

Sets of real numbers

 

Inequalities; introduction to symbols

  a < b : "a is less than b" 

  x > y : "x is greater than y"

  c ≤ d : "c is less than or equal to d"

  z ≥ w : "z is greater than or equal to w"

  e ≪ f : "e is much, much less than f"  → *not proper math, but used frequently in data science

 

 

Interval Notation

Closed intervals: [2, 3.1] → {x ∈ ℝ : 2 ≤ x ≤ 3.1}

Open intervals: (5, 8) → {x ∈ ℝ : 5 < x < 8}

Half-open intervals: (7.1, 15]  → {x ∈ ℝ : -7.1 < x ≤ 15}

Rays: [2, ∞) → {x ∈ ℝ : x ≥ 2}  /  (-∞, 7.1) → {x ∈ ℝ : x < 7.1}

 

Sigma

∑ : tells you to sum the results

 

Simplification Rules

Distributive Property : a(b+c) = ab + ac

  *In other words, constants inside the summed expression can be pulled outside.

 

Commutative Property : a + b = b + a

  *In other words, we can add the terms in any order. (즉, 교환법칙!)

 

 

Mean and Variance

  the symbol of 𝑢 is the "mean of 𝑥"

  𝜎2 is the "variance of 𝑥"

  the standard deviation is denoted "𝜎"

 

Mean Centering

*mean centering data produces a new data set, which has the same relationships, but the mean is zero.

Z, W have the same mean, BUT Z is more spread out than W,

 → variance of Z should be greater than that for W.