[Machine Learning] Random Forest

1 minute read

RandomForest Classifier

![Ensemble Learning: Bagging & Boosting

by Fernando López

Towards Data Science](https://miro.medium.com/max/2000/1*zTgGBTQIMlASWm5QuS2UpA.jpeg)

import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import model_selection, svm, metrics
from sklearn.model_selection import GridSearchCV

import numpy as np
import pandas as pd

DecisionTree

cancer = load_breast_cancer()
np.random.seed(9)
X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, stratify=cancer.target)
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

DecisionTreeClassifier()

print(clf.score(X_test, y_test))

0.916083916083916

X_train.shape, X_test.shape, cancer.data.shape

((426, 30), (143, 30), (569, 30))

feature_importances_:

In trees, it tells how much each feature contributes to decreasing the weighted impurity.
in Random Forest, it averages the decrease in impurity over trees.

# 결정 트리를 사용한 경우의 중요 특성

list(zip(cancer.feature_names, clf.feature_importances_.round(4)))[:10]

[('mean radius', 0.0),
 ('mean texture', 0.0417),
 ('mean perimeter', 0.0),
 ('mean area', 0.0),
 ('mean smoothness', 0.0),
 ('mean compactness', 0.0),
 ('mean concavity', 0.0),
 ('mean concave points', 0.0426),
 ('mean symmetry', 0.0114),
 ('mean fractal dimension', 0.0)]

# 정렬해서 상위 10개 검색

df = pd.DataFrame({'feature':cancer.feature_names,'importance':clf.feature_importances_ })
df=df.sort_values('importance', ascending=False)
print(df.head(10))

                 feature  importance
     worst perimeter    0.694689
worst concave points    0.121068
  mean concave points    0.042647
         mean texture    0.041720
       worst texture    0.039639
          area error    0.017216
        worst radius    0.017188
   compactness error    0.012042
        mean symmetry    0.011405
    smoothness error    0.002385

x = df.feature
y = df.importance
ypos = np.arange(len(x))

plt.figure(figsize=(10,7))
plt.barh(x, y)
plt.yticks(ypos, x)
plt.xlabel('Importance')
plt.ylabel('Variable')
plt.xlim(0, 1)
plt.ylim(-1, len(x))
plt.show()

output_9_0

Random Forest (Bagging)

중복을 허용한 n개의 샘플을 추출하여 평균을 구하는 작업을 n번 반복
Overfitting 해소, 일반적인 모델 생성
Parallel Session

# 랜덤 포레스트를 사용한 경우의 중요 특성

rfc = RandomForestClassifier(n_estimators=500)
rfc.fit(X_train, y_train)
print(rfc.score(X_test, y_test))

df = pd.DataFrame({'feature':cancer.feature_names,'importance':rfc.feature_importances_ })
df=df.sort_values('importance', ascending=False)
x = df.feature
y = df.importance
ypos = np.arange(len(x))

plt.figure(figsize=(10,7))
plt.barh(x, y)
plt.yticks(ypos, x)
plt.xlabel('Importance')
plt.ylabel('Variable')
plt.xlim(0, 1)
plt.ylim(-1, len(x))
plt.show()

0.951048951048951

output_10_1

골고루 사용됨을 알 수 있다.

Gradient Boosting (Boosting)

Bagging은 독립적인 input data를 가지고(복원 추출) 독립적으로 예측하지만, Boosting은 이전 모델이 다음 모델에 영향을 준다.
맞추기 어려운 문제를 맞추는 것에 초점을 둔다.
Serial Session

from sklearn.ensemble import GradientBoostingClassifier

gbc = GradientBoostingClassifier(n_estimators=500)
gbc.fit(X_train, y_train)
print(gbc.score(X_test, y_test))

0.9790209790209791

df = pd.DataFrame({'feature':cancer.feature_names,'importance':gbc.feature_importances_ })
df = df.sort_values('importance', ascending=False)
x = df.feature
y = df.importance
ypos = np.arange(len(x))

plt.figure(figsize=(10,7))
plt.barh(x, y)
plt.yticks(ypos, x)
plt.xlabel('Importance')
plt.ylabel('Variable')
plt.xlim(0, 1)
plt.ylim(-1, len(x))
plt.show()

output_14_0

Share on

Twitter Facebook LinkedIn

wowo0709

[Machine Learning] Random Forest

RandomForest Classifier

DecisionTree

feature_importances_:

Random Forest (Bagging)

Gradient Boosting (Boosting)

Share on

Leave a comment

You may also enjoy

[Python] Effective Python CH 2. 리스트와 딕셔너리 - 1

[Python] Effective Python CH 1. 파이썬답게 생각하기 - 2

[Python] Effective Python CH 1. 파이썬답게 생각하기 - 1

[Python] Effective Python 전체 목차