[Coursera Stanford Machine Learning (week 3)] Logistic Regression Model

sunnyshiny 2023. 2. 6. 18:42

2023. 2. 6. 18:42

728x90

해당 내용은 coursera Andrew Ng교수의 Machine Learning강의노트정리

Cost Function

Cost Function for logistic regression

$$ J(\theta)=\frac {1}{m}\sum_{i=1}^{m} Cost(h_{\theta}(x^{(i)}, y^{(i)})\\\begin {cases} Cost(h_{\theta}(x), y)=-log(h_{\theta}(x)) \quad if \ y=1\\Cost(h_{\theta}(x), y)=-log(1-h_{\theta}(x)) \quad if \ y=0\end {cases} $$

$y=1$일때 $h_{\theta}(x)=1$이면 cost function은 0이고 $h_{\theta}(x)$가 0에 가까워질수록 무한에 가까워진다.

$y=0$이고 $h_{\theta}(x)=0$이면 cost function도 0 이며 $h_{\theta}(x)$의 값이 1에 가까워질수록 cost function은 무한에 가까워지게 된다.

Simplified Cost Function and Gradient descent

로지스틱 regression의 비용함수를 하나의 식으로 표현하면

$$ Cost(h_{\theta}(x),y)=-ylog(h_{\theta}(x))-(1-y)log(1-h_{\theta}(x)) $$

$y=1$이면 $-log(h_{\theta}(x))$만 남게 되고 $y=0$이면 $-log(1-h_{\theta}(x))$만 남게 된다.

비용함수 전체에 대하여 식을 써보면

$$ J(\theta)=-\frac{1}{m}\sum_{i=1}^mCost(h_{\theta}(x^{(i)}),y^{(i)})\\=-\frac{1}{m}\sum_{i=1}^m[y^{(i)}log(h_{\theta}(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))] $$

Gradient descent 는 비용함수를 minimize 하는 파라미터 $\theta$를 구하는 것으로

$$ min_{\theta}J(\theta):\\ \theta_j:=\theta_j-\alpha \frac {\partial}{\partial \theta_j} J(\theta)\\=\theta_j-\alpha \sum_{i=1}^m(h_{\theta}(x^{(i)}), y^{(i)})x_j^{(i)} $$

로지스틱 회귀의 알고리즘과 선형회귀 알고리즘과 동일하며 차이점은 hypothesis function식이다.

Linear regression : $h_{\theta}(x) = \theta^Tx$

Logistic regression : $h_{\theta}(x)= \frac {1}{1+e^{-\theta^Tx}}$

로지스틱 회귀 알고리즘을 벡터로 표현하면

$$ h=g(X\theta)\\J(\theta)=\frac {1}{m}(-y^Tlog(h)-(1-y)^Tlog(1-h))\\\theta:= \theta-\frac {\alpha}{m} X^T(g(X\theta)-\vec {y}) $$

Multi-class Classification: One-vs-all

기존에는 스팸메일인지 일반 메일인지, 악성종양인지 아닌지 2가지의 경우만 판별하는 binary classification문제였다면 multi-class classification은 class가 3개 이상인 경우를 말한다.

예시)

e-mail folder/tag : work, family, hobby

의학진단: 정상, 감기, 독감

날씨: 햇빛, 구름, 비, 눈

각 class 별로 해당 class와 나머지를 나누는 binary decision을 하여 hypothesis function을 만들고 각 클래스에 대한 y의 값이 해당 클래스일 확률을 예측한다. test data. new data, 는 hypothesis function의 값이 가장 큰 클래스를 택하면 된다.

Refrenece
Machine learning , Coursera, Andrew Ng

http://amsi.org.au/ESA_Senior_Years/SeniorTopic3/3h/3h_2 content_2.html

728x90

'Data Science > Machine Learning' 카테고리의 다른 글

[Coursera Stanford Machine Learning (week4)] Neural Network Motivations (0)	2023.02.07
[Coursera Stanford Machine Learning (week3)]Solving the problem of overfitting (0)	2023.02.07
[Coursera Stanford Machine Learning (week3) ]Classification and Representation (0)	2023.02.06
[Coursera Stanford Machine Learning (week 2)] Computing parameters Analytically (0)	2023.02.03
[Coursera Stanford Machine Learning (week 2)] Multivariate Linear Regression (0)	2023.02.03

Sunny Finance & Tech Blog