[Module 4] Supervised Learning: Linear Regression Model

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

juooo1117

[Module 4] Supervised Learning: Linear Regression Model 본문

Artificial Intelligence/LG Aimers: AI전문가과정

[Module 4] Supervised Learning: Linear Regression Model

Hyo__ni 2024. 1. 9. 19:18

Part2. Linear Regression Model

Linear models

모델의 출력이 real 이고 연속인 값으로 구성된다.

모델은 ouput h(x)와 정답인 y 의 차이(error)를 최소화 하게끔 학습이 진행된다.

연속되는 출력을 예측하고 추론하기 위해서 dataset에서 입력과 정답으로 구성되어 있는(즉, labeling 되어있는 정답이 있는 값)

- linear combination 으로 구성되어 있다.

- 하지만, 선형모델이라고 해서 반드시 입력 변수가 선형일 필요는 없다. → 2번째 식의 경우, x에 대해서는 선형식이 아니다.

Feature organization

모델에 포함되는 입력 변수의 개수에(factor) 따라서 구분

univariate problem : 1개의 입력변수(input feature)만 존재한다.

multivariate problem : 다양한 변수를 선형 모델에 포함하는 것

Linear Regression → 주어진 입력에 대해서 출력과의 선형적인 관계를 추론하는 문제이다.

주어진 입력과 출력의 쌍으로서 선형 모델을 학습하게 되고, 새로운 입력이 들어왔을 때 output을 산출하는 모델이다.

Linear Regression framework

- Which predictor? : Hypothesis class

- How good is a predictor? : Loss function

- How to compute the best predictor? : Optimization algorithm

Parameter optimization

L2 Cost Function → goal: minimizing MSE (가장 data에 fitting하도록 하는 선형모델을 찾는 것)

Least Square Problem (Normal Equation)

최적화 파라미터 θ → cost function을 가장 최소화하는 것

- θ 에 관한 derivative term을 구한 다음, 0이 되도록 하는 θ 값을 구하면, 바로 이것이 최적화된 파라미터 값을 구하는 것!

- θ* : solution to linear regression

- Derived by minimizing Eθ over all possible θ ∈ 𝐑𝒅+𝟏

- E : continuous, differentiable, convex

하지만, 이렇게 normal equation을 이용해서 해를 구하는 것은 최근 머신러닝처럼 data sample 개수가 늘어나는 경우에는 매우 비효율적인 문제를 야기한다.

- What if the dimension of the input vector hugely increases ( → huge computational complexity)?

- What if the matrix is not invertible?

따라서, Iterative 하게 최적의 파라미터(θ)를 찾아가는 algorithm 이 필요하다. 즉, Gradient descent 의 필요성!

**Gradient : 함수를 미분하여 얻는 term으로, 해당 함수의 변화하는 정도를 표현하는 값 (the derivative of vector functions)

Iterative optimization by Gradient Descent

Gradient 가 0이 될 때까지 반복적으로 θ를 바꾸어 나가면서 탐색을 해 나가는 것

함수의 변화도가 가장 큰 방향으로 이동하는 것을 반복한다. (Direction of greatest increase or decrease of a function)

- 𝛼 : step size (parameter를 업데이트하는 속도를 조절할 때 사용하는 값)

- Greedy algorithm의 특성 : gradient descent algorithm은 경우에 따라서 Local optimum 만을 달성하기가 쉽다.

Method to solve numerically

The function 𝐽 : objective function that we want to optimize.

𝛼: the step size to control the rate to move down the error surface. → hyper parameter, which is a positive number (c.f. 𝜃 is a learnable parameter)

Start with initial parameters θ(ɵ0, ɵ1)

Keep changing the parameters to reduce 𝐽 until achieving the minimal cost (convergence 할 때까지 반복한다.)

Gradient Descent algorithm for linear regression

Gradient Descent : 여러번 반복적인 과정을 통해 해를 얻는다.

Normal Equation : 한번에 one-step으로 해를 얻을 수 있지만, sample size가 늘어나면 inverse matrix를 구하기 어렵다.

'Artificial Intelligence > LG Aimers: AI전문가과정' 카테고리의 다른 글

[Module 4] Supervised Learning: Linear Classification (0)	2024.01.10
[Module 4] Supervised Learning: Gradient Descent (1)	2024.01.10
[Module 3] Machine Learning 개론: Recent Progress of Large Language Models (1)	2024.01.07
[Module 3] Machine Learning 개론: Bias and Variance (1)	2024.01.07
[Module 3] Machine Learning 개론: Introduction to ML (0)	2024.01.06

'Artificial Intelligence/LG Aimers: AI전문가과정' Related Articles

juooo1117

[Module 4] Supervised Learning: Linear Regression Model 본문

[Module 4] Supervised Learning: Linear Regression Model

Part2. Linear Regression Model

'Artificial Intelligence > LG Aimers: AI전문가과정' 카테고리의 다른 글

티스토리툴바