[Deep Learning from Scratch] chapter 5.Backpropagation (오차역전파법)

sunnyshiny 2023. 2. 14. 18:29

2023. 2. 14. 18:29

728x90

오차역전파법( Backpropagation)¶

Chain Rule

$\frac{\partial z}{\partial x} =\frac{\partial z}{\partial t}\frac{\partial t}{\partial x} $

곱셈계층¶

역전파로 흘러들어온 값에 순전파때 입력신호들을 서로 바꾼 값을 곱한다. 즉 순전파때 x였다면 역전파에서는 y를, 순전파때 y였다면 역전파에서는 x로 바꾸어 곱하게 됨

In [1]:

class MulLayer:
    def __init__(self): # 입력값 초기화 순전파  시 입력값을 유지하기 위해서
        self.x = None
        self.y = None
        
    def forward(self, x, y):
        self.x = x
        self.y = y
        out = x * y
        return out
    
    def backward(self,dout):
        dx = dout * self.y # x와 y를 바꿈
        dy = dout * self.x 
        return dx, dy

In [3]:

apple = 100
apple_num = 2
tax = 1.1 

# 계층
mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward propagation
apple_price = mul_apple_layer.forward(apple, apple_num)
price  = mul_tax_layer.forward(apple_price, tax)
print('Apple price:', price)

# backpropagation
dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)
print('diff of apple:{}, diff of apple number:{}, diff of tax:{}'.format(dapple, dapple_num, dtax))

Apple price: 220.00000000000003
diff of apple:2.2, diff of apple number:110.00000000000001, diff of tax:200

덧셈계층¶

전해진 미분값에 1을 곱한기만 할 뿐이므로 입력된 값을 그대로 다음 노드에 보내게 됨

In [4]:

class AddLayer:
    def __init__(self):
        pass # 덧셈 계층에는 초기화는 필요없다
    
    def forward(self, x, y):
        out = x + y
        return out
    
    def backward(self, dout):
        dx = dout * 1
        dy = dout * 1
        return dx, dy

In [5]:

apple = 100
apple_num = 2
orange =150
orange_num = 3
tax = 1.1

# 계층들
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange = AddLayer()
mul_tax_layer = MulLayer()

# 순전파
apple_price = mul_apple_layer.forward(apple, apple_num)
orange_price = mul_orange_layer.forward(orange, orange_num)
all_price = add_apple_orange.forward(apple_price, orange_price)
price = mul_tax_layer.forward(all_price, tax)

# 역전파
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)
dapple_price, dorange_price = add_apple_orange.backward(dall_price)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

print('price',price)
print(dapple, dapple_num, dorange, dorange_num, dtax)

price 715.0000000000001
2.2 110.00000000000001 3.3000000000000003 165.0 650

활성화 함수 계층 구현¶

ReLu¶

활성화 함수 수식$y= \begin{cases} x, & \mbox{ }{x > 0}\mbox{} \\ 0, & \mbox{}(x\le 0)\mbox{} \end{cases}$
미분$ \frac{\partial y}{\partial x}= \begin{cases} 1, & \mbox{ }{x > 0}\mbox{} \\ 0, & \mbox{}(x\le 0)\mbox{} \end{cases}$
순전파의 값이 0 이하이면 역전파의 값은 0이 되어야 한다.

In [6]:

class Relu:
    def __init__(self):
        self.mask = None
        
    def forward(self, x):
        self.mask =(x <=0) # x의 원소 값이 0 이하인 인덱스는  True, 이상이면  False 인 배열
        out = x.copy()
        out[self.mask] = 0
        return out
    
    def backward(self, dout):
        dout[self.mask] = 0
        dx = dout
        return dx
    

Sigmoid¶

활성화 함수 수식
$\sigma=\frac{1}{1+exp(-x)}$

In [7]:

class Sigmoid:
    def __init__(self):
        self.out = None
        
    def forward(self, x):
        out  = 1/ (1+np.exp(-x))
        self.out = out  # 순전파의 출력을 보관했다가 역전파 계산에 씀\
        return out
    
    def backward(self, dout):
        dx = dout *(1-self.out) * self.out
        
        return dx

Affine 구현¶

행렬의 곱을 기하학에서 Affine transformation이라고 함$$ \frac{\partial L}{\partial X} = \frac{\partial L}{\partial Y}W^T \\ \frac{\partial L}{\partial W} = X^T \frac{\partial L}{\partial Y}$$
데이터가 두개인 경우 편향은 두 데이터 각각에 더해짐

In [1]:

X_dot_W = np.array([[0, 0, 0],[10, 10, 10]])
B = np.array([1, 2, 3])
X_dot_W+B

Out[1]:

array([[ 1,  2,  3],
       [11, 12, 13]])

In [4]:

dY = np.array([[1, 2, 3],[4, 5, 6]])
dB = np.sum(dY, axis=0)
dB

Out[4]:

array([5, 7, 9])

In [5]:

class Affine:
    def __init__(self, W, b):
        self.W = W
        self.b = b
        self.x = None
        self.dW = None
        self.db = None
        
    def forward(self, x):
        self.x = x
        out = np.dot(x, self.W) + self.b
        return out
    
    def backward(self, dout):
        dx = np.dot(dout, self.W.T)
        self.dW = np.dot(self.x.T, dout)
        self.db = np.sum(dout, axis=0) # 열방향 합
        return dx
        

Softmax with Loss¶

신경망에서 추론할 경우에는 일반적으로 softmax계층을 사용하지 않으나 학습을 할 경우에는 softmax계층이 필요
softmax의 역전파는 prediction - true value인 출력과 정답의 차이로 오차를 그대로 드러낸다

In [8]:

class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None # 손실
        self.y = None # softamx 출력
        self.t = None  # 정답레이블
        
    def forward(self, x, t):
        self.t = t
        self.y = y
        self.loss = cross_entropy_error(self.y, self.t)
        return self.loss
    
    def backward(self, dout =1):
        batch_size = self.t.shape[0]
        dx = (self.y - self.t) / batch_size
        return dx

Gradient check¶

수치 미분의 이점을 구현하기 쉬워 버그가 숨어 있기 어려우나 backpropagation은 구현이 복잡하여 종종 실수를 하게 된다. 따라서 backpropagation의 결과를 수치 미분과 비교하여 backpropagation이 제대로 구현되었는지 검증한다.

Reference

밑바닥부터 시작하는 딥러닝

ratsgo's blog

https://m.blog.naver.com/PostView.naver?isHttpsRedirect=true&blogId=sky930425&logNo=221545712859

https://m.blog.naver.com/riverrun17/221900860949

https://velog.io/@gmlwlswldbs/%EC%86%8C%ED%94%84%ED%8A%B8%EB%A7%A5%EC%8A%A4-%EC%97%AD%EC%A0%84%ED%8C%8C

https://velog.io/@clayryu328/%EB%B0%91%EB%B0%94%EB%8B%A5%EB%B6%80%ED%84%B0-%EC%8B%9C%EC%9E%91%ED%95%98%EB%8A%94-%EB%94%A5%EB%9F%AC%EB%8B%9D-7-sigmoid-relu%EC%B8%B5%EC%9D%98-%EC%97%AD%EC%A0%84%ED%8C%8C