[Machine Learning] Linear Classification

Linear classification (use Cross Entropy as loss function)

  • also called log loss (logistic regression)
  • Logistic Regression
  • Classification by Calculating parameters one by one

Cross Entropy 손실 함수는 분류에 있어 기본적으로 많이 사용되는 손실 함수이고, 이진 분류에 사용되는 것을 Binary Cross Entropy 함수라고 합니다.

Cross Entropy

Binary Cross Entropy/Log Loss for Binary Classification

Binary Cross Entropy

How to choose cross-entropy loss function in Keras? - knowledge Transfer

Make dataset

make_blobs 함수를 사용하면 선형 분류에 적합한 데이터셋을 생성할 수 있습니다.

from sklearn.datasets import make_blobs

N = 500
(X, y) = make_blobs(n_samples=N, n_features=2, centers=2, cluster_std=2.0, random_state=17)
x1, x2 = X[:,0], X[:,1]
plt.scatter(X[:,0], X[:,1], c=y)


array([0, 1, 0, 1, 1, 0, 0, 0, 1, 0])


w1 = np.random.randn()
w2 = np.random.randn()
b  = np.random.randn() 

def sigmoid_activation(z):
    return 1.0 / (1 + np.exp(-z))

lossHistory = []
epochs = 500
alpha = 0.03

for epoch in np.arange(epochs):
    z = w1*x1 + w2*x2 + b
    y_hat = sigmoid_activation(z)       # prediction

    loss = -((y*np.log(y_hat) + (1-y)*np.log(1-y_hat))).mean()  # loss = cross entropy
    dloss_dz = y_hat - y
    w1_deriv = dloss_dz * x1        # d(loss)/dw1 = d(loss)/dz * dz/dw1
    w2_deriv = dloss_dz * x2
    b_deriv = dloss_dz * 1
    w1 = w1 - (alpha * w1_deriv).mean()
    w2 = w2 - (alpha * w2_deriv).mean()
    b  = b  - (alpha * b_deriv).mean()
    if epoch %10 == 0:
        print('epoch=', epoch, 'loss=', loss, 'w1=', w1, 'w2=', w2, 'b=', b)

print(w1, w2, b)
accuracy = ((sigmoid_activation(w1*x1 + w2*x2 + b) > 0.5) == y).sum()/N

# construct a figure that plots the loss over time
fig = plt.figure()
plt.plot(np.arange(0, epochs), lossHistory)
fig.suptitle("Training Loss")
plt.xlabel("Epoch #")
epoch= 0 loss= 1.616035162169675 w1= -0.9490649093578043 w2= -0.49682963524704316 b= -0.696164229287423
epoch= 10 loss= 0.7008286524955278 w1= -0.4496766747447769 w2= -0.5260477241204355 b= -0.809026252878777
epoch= 20 loss= 0.29203905527389645 w1= -0.12827521312304996 w2= -0.5202691401157856 b= -0.8821467316261569
epoch= 30 loss= 0.1898689054567451 w1= 0.027749581072294832 w2= -0.5269227722920128 b= -0.9228796002094153
epoch= 490 loss= 0.09888960631144601 w1= 0.3732138813234712 w2= -1.0025023543610019 b= -1.4794882446903452

0.3732098201390971 -1.004432647797522 -1.4881961572489584



plt.scatter(X[:,0], X[:,1], c=y)
xx = np.linspace(-10,10,100)
yy = -w1/w2 * xx -b/w2
plt.plot(xx, yy)


Linear classification (use Hinge loss as loss function)

  • Hinge loss is primarily used with Support Vector Machine (SVM) Classifiers with class labels -1 and 1. So make sure you change the label of the ‘Malignant’ class in the dataset from 0 to -1.
  • Hinge Loss not only penalizes the wrong predictions but also the right predictions that are not confident.
  • Hinge loss for input-output pair (x,y) is given as:
    • L = max(0, 1 - yf(x))
    • L = 0 (if yf(x) >= 1), 1-yf(x) (otherwise)
    • dL/dw1 = 0 (if yf(x) >= 1), -yx1 (otherwise)

서포트 벡터 머신의 힌지 손실에 대한 명확한 설명입니다.

Make dataset

N = 500
(X, y_org) = make_blobs(n_samples=N, n_features=2, centers=2, cluster_std=2.0, random_state=17)
x1, x2 = X[:,0], X[:,1]
y = y_org.copy()
y[y==0] = -1
X[:5], y[:5], y_org[:5]
(array([[ -5.48619226,   1.21306671],
        [ -2.89056798,  -9.18025054],
        [ -1.5288614 ,   1.01129561],
        [ -7.48266658,  -9.99569036],
        [ -7.03983988, -10.35802726]]),
 array([-1,  1, -1,  1,  1]),
 array([0, 1, 0, 1, 1]))


w1, w2, b = np.random.randn(), np.random.randn(), np.random.randn()
lossHistory = []
epochs = 500
alpha = 0.03

N = len(x1)
for epoch in np.arange(epochs):

    w1_deriv, w2_deriv, b_deriv, loss = 0., 0., 0., 0.
    for i in range(N):
        score = y[i]*(w1*x1[i] + w2*x2[i] + b)
        if score <= 1: # Loss 발생
            w1_deriv = w1_deriv - x1[i]*y[i]
            w2_deriv = w2_deriv - x2[i]*y[i]
            b_deriv = b_deriv - y[i]
            loss = loss + (1 - score)
        # else : derivatives are zero. loss is 0
    # mean
    w1_deriv /= float(N)
    w2_deriv /= float(N)
    b_deriv  /= float(N)
    loss /= float(N)
    # update parameters
    w1 = w1 - alpha * w1_deriv
    w2 = w2 - alpha * w2_deriv
    b  =  b - alpha *  b_deriv

    if epoch %10 == 0:
        print('epoch=', epoch, 'loss=', loss, 'w1=', w1, 'w2=', w2, 'b=', b)
print(w1, w2, b)

accuracy = ((w1*x1 + w2*x2 + b > 0) == y_org).sum()/N

epoch= 0 loss= 0.32953561452982716 w1= 0.10905148299148527 w2= -0.8707142547367681 b= -0.22266410994705235
epoch= 10 loss= 0.17294512065349388 w1= 0.2884756613203777 w2= -0.7980926952950345 b= -0.27570410994705236
epoch= 20 loss= 0.13295245866300887 w1= 0.3661718550159059 w2= -0.748069760577352 b= -0.3095441099470523
epoch= 30 loss= 0.12799019301386722 w1= 0.38203122269240997 w2= -0.7288418002221119 b= -0.33294410994705237
epoch= 490 loss= 0.06746119518336888 w1= 0.10373983094412123 w2= -0.46597462545524376 b= -1.144804109947046

0.1017903110742461 -0.4643184786720022 -1.1584841099470464

Training Loss



plt.scatter(X[:,0], X[:,1], c=y)

w1, w2, b = clf.coef_[0,0], clf.coef_[0,1], clf.intercept_
xx = np.linspace(-15,5,100)
yy = -w1/w2 * xx -b/w2
plt.plot(xx, yy, c='r')


Use library function

선형 분류를 위한 대표적인 분류 모델에는 SGDClassifierLogisticRegression 모델이 있습니다.

SGDClassifier 모델은 기본 손실 함수로 Hinge loss를 사용합니다.

LogisticRegression 모델은 기본 손실 함수로 Cross Entropy를 사용합니다.

from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.datasets import make_blobs

# Make dataset
N = 500
(X, y) = make_blobs(n_samples=N, n_features=2, centers=2, cluster_std=2.0, random_state=17)
# Create model and train
clf = SGDClassifier()      
clf.fit(X[:,:2], y)    
print("SGDClassifier: ", clf.score(X[:, :2],y))
print(clf.coef_, clf.intercept_)

log = LogisticRegression()      
log.fit(X[:,:2], y)    
print("Logistic Regression: ", log.score(X[:, :2],y))
print(log.coef_, log.intercept_)
plt.scatter(X[:,0], X[:,1], c=y)

# Plotting
w1, w2, b = clf.coef_[0,0], clf.coef_[0,1], clf.intercept_
xx = np.linspace(-15,5,100)
yy = -w1/w2 * xx -b/w2
plt.plot(xx, yy, c='r')

w1, w2, b = log.coef_[0,0], log.coef_[0,1], log.intercept_
xx = np.linspace(-15,5,100)
yy = -w1/w2 * xx -b/w2
plt.plot(xx, yy, c='b')
SGDClassifier:  0.984
[[ -8.56625092 -34.52822982]] [-211.3932073]
Logistic Regression:  0.994
[[-0.46621709 -1.9549905 ]] [-10.32380395]


