[Coursera Stanford Machine Learning (week4)] Neural Networks

Data Science/Machine Learning

[Coursera Stanford Machine Learning (week4)] Neural Networks

sunnyshiny 2023. 2. 8. 19:01

728x90

해당 내용은 coursera Andrew Ng교수의 Machine Learning강의 노트 정리

Model Representation

신경망을 사용한 가설함수를 단순하게 표현한다면 뉴런은 입력을 출력으로 전달되는 전기 입력 spike를 받아들이는 계산단위이다.

Dentrite는 input feature $x_1, x_2.. x_n$이고 ouput은 hypothesis function읠 결과이다. 모델에서 $x_0$는 bias unit으로 항상 1의 값을 갖는다. Neural network은 logistic function $\frac {1}{1+e^{-\theta^Tx}}$ 을 사용하는데 이를 sigmoid activation function이라고 일컽는다. 또한 $\theta$는 weight라고 부른다.

simple representation : $[x_0x_1x_2]→[ \qquad ]→h_θ(x)$

입력노드는 input layer이고 hypothesis funtion을 통해 ouput이 계산되는 이를 output layer라고 한다. 그리고 그 사이에 있는 노드는 hidden layer라고 한다. Hidden layer node의 $a_0^2... a_n^2$을 activation unit이라고 부른다.

$a_i^{(j)}$ : activation of unit $i$ in layer $j$

$Θ^{(j)}$ : matrix of weights controlling function mapping from layer $j$ to layer $j+1$

3x4 행렬을 사용하여 활성화 노드를 계산하는데 각 행의 입력과 parameter를 곱하여 activation node값을 구한다. Hypothesis function ouput은 입력값인 activation node와 해당 레이어의 weight에 activation function 인 logistic function을 적용하여 구한다.

가중치 행렬은 layer $j$에 $s_j$의 unit을 갖고 layer $j+1$에 $s_{j+1}$의 unit을 갖는 다면 $Θ^{(j)}$의 dimension은 $s_{j+1}\times (s_j+1)$이다. “+1”은 bias node로 인해 생긴다. 예를 들어 layer 1이 2개의 input node가 있고 layer 2에 4개의 activation 노드가 있다고 할 때 가중치 행렬 $Θ^{(1)}$ 의 dimension 4X3 이다.

$$ \begin {align*} a_1^{(2)} = g(\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3) \newline a_2^{(2)} = g(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3) \newline a_3^{(2)} = g(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3) \newline h_\Theta(x) = a_1^{(3)} = g(\Theta_{10}^{(2)}a_0^{(2)} + \Theta_{11}^{(2)}a_1^{(2)} + \Theta_{12}^{(2)}a_2^{(2)} + \Theta_{13}^{(2)}a_3^{(2)}) \newline \end {align*} $$

$$ (\Theta_{10}^{(1)}x_0 + \Theta_{11}^{(1)}x_1 + \Theta_{12}^{(1)}x_2 + \Theta_{13}^{(1)}x_3)=z_1^{(2)}\\(\Theta_{20}^{(1)}x_0 + \Theta_{21}^{(1)}x_1 + \Theta_{22}^{(1)}x_2 + \Theta_{23}^{(1)}x_3)=z_2^{(2)}\\(\Theta_{30}^{(1)}x_0 + \Theta_{31}^{(1)}x_1 + \Theta_{32}^{(1)}x_2 + \Theta_{33}^{(1)}x_3)=z_3^{(2)}\\\Rightarrow \begin {align*} a_1^{(2)} = g(z_1^{(2)}) \newline a_2^{(2)} = g(z_2^{(2)}) \newline a_3^{(2)} = g(z_3^{(2)}) \newline \end {align*}\\ $$

벡터로 표현하면

$$ \begin {align*} x = \begin {bmatrix} x_0 \newline x_1 \newline\cdots \newline x_n\end {bmatrix} &z^{(j)} = \begin {bmatrix} z_1^{(j)} \newline z_2^{(j)} \newline\cdots \newline z_n^{(j)}\end {bmatrix}\end {align*} $$

Input layer $x=a^{(1)}$이라고 하면 $z^{(2)}=\Theta^{(1)}a^{(1)}$이 되므로 일반화하여 $z^{(j+1)}=Θ^{(j)}a^{(j)}$으로 표현할 수 있다.

마지막 output layer은 $h_Θ(x)=a^{(j+1)}=g(z^{(j+1)})$ 으로 layer $j$와 layer $j+1$에서 logistic regression을 했던 것과 동일한 방식이다.

Layer를 깊게 쌓음으로써 더 복잡한 non-linear hypothesis를 생성할 수 있다.

Refrenece
Machine learning , Coursera, Andrew Ng

728x90