[AITech] 20220204 - Seaborn Advances

2 minute read

학습 내용

이번 포스팅의 내용은 여러 차트를 사용하여 정보량을 높이는 방법입니다.

이전에는 ax에 하나를 그리는 방법이었다면, 이제는 Figure-level로 전체적인 시각화를 그리는 API입니다.

student = pd.read_csv('./StudentsPerformance.csv')
'''
   gender race/ethnicity parental level of education         lunch  \
0  female        group B           bachelor's degree      standard   
1  female        group C                some college      standard   
2  female        group B             master's degree      standard   
3    male        group A          associate's degree  free/reduced   
4    male        group C                some college      standard  
  test preparation course  math score  reading score  writing score  
0                    none          72             72             74  
1               completed          69             90             88  
2                    none          90             95             93  
3                    none          47             57             44  
4                    none          76             78             75 
'''
iris = pd.read_csv('./iris.csv')
'''
   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species
0   1            5.1           3.5            1.4           0.2  Iris-setosa
1   2            4.9           3.0            1.4           0.2  Iris-setosa
2   3            4.7           3.2            1.3           0.2  Iris-setosa
3   4            4.6           3.1            1.5           0.2  Iris-setosa
4   5            5.0           3.6            1.4           0.2  Iris-setosa
'''

Joint Plot

Joint Plot은 distribution api에서 살펴봤던 2개 feature의 결합 확률분포와 더불어 각각의 분포도 함께 볼 수 있는 시각화를 제공합니다.

sns.jointplot(x='math score', y='reading score',data=student,
             height=7)

sns.jointplot(x='math score', y='reading score',data=student,
              hue='gender'
             )

sns.jointplot(x='math score', y='reading score',data=student,
              kind='reg', # { “scatter” | “kde” | “hist” | “hex” | “reg” | “resid” }, 
             )

Pair Plot

데이터 셋의 pair-wise 관계를 시각화하는 함수입니다.

sns.pairplot(data=iris)

다음과 같은 파라미터들로 커스터마이징할 수 있습니다.

hue
kind: 전체 서브플롯의 그래프 종류를 지정
- scatter, kde, hist, reg
diag_kind: 대각선에 있는 서브플롯의 그래프 종류를 지정
- auto, hist, kde, None
corner: 기본적으로 pair plot은 그래프가 대각선을 기준으로 대칭이기 때문에 대각선 아래쪽의 plot만 보도록 지정합니다.

sns.pairplot(data=iris, hue='Species', 
             kind='hist',
             diag_kind='kde',
             corner=True)

pairplot과 같이 다중 패널을 사용하는 시각화를 의미합니다.

다만 pairplot은 feature-feature 사이를 살폈다면, Facet Grid는 feature-feature 뿐만이 아니라 feature’s category-feature’s category의 관계도 살펴볼 수 있습니다.

단일 시각화도 가능하지만, 여기서는 최대한 여러 pair를 보며 관계를 살피는 것을 위주로 보면 좋습니다.

총 4개의 큰 함수가 Facet Grid를 기반으로 만들어졌습니다.

catplot : Categorical
displot : Distribution
relplot : Relational
lmplot : Regression

catplot

이미 수 많은 방법을 앞에서 살펴보았기에 각각에 대한 설명은 생략하도록 하겠습니다. catplot은 다음 방법론을 사용할 수 있습니다.

Categorical scatterplots:
- stripplot() (with kind="strip"; the default)
- swarmplot() (with kind="swarm")
Categorical distribution plots:
- boxplot() (with kind="box")
- violinplot() (with kind="violin")
- boxenplot() (with kind="boxen")
Categorical estimate plots:
- pointplot() (with kind="point")
- barplot() (with kind="bar")
- countplot() (with kind="count")

sns.catplot(x="race/ethnicity", y="math score", hue="gender", data=student,
            kind='box', 
            col='lunch', row='test preparation course'
           )

displot

displot은 다음 방법론을 사용할 수 있습니다.

histplot() (with kind="hist"; the default)
kdeplot() (with kind="kde")
ecdfplot() (with kind="ecdf"; univariate-only)

sns.displot(x="math score", hue="gender", data=student,
           col='race/ethnicity', kind='kde', fill=True,
            col_order=sorted(student['race/ethnicity'].unique())
           )

relplot

relplot은 다음 방법론을 사용할 수 있습니다.

scatterplot() (with kind="scatter"; the default)
lineplot() (with kind="line")

sns.relplot(x="math score", y='reading score', hue="gender", data=student,
           col='lunch')

lmplot

lmplot은 다음 방법론을 사용할 수 있습니다.

regplot()

sns.lmplot(x="math score", y='reading score', hue="gender", data=student)

참고 자료

Share on

Twitter Facebook LinkedIn

wowo0709

[AITech] 20220204 - Seaborn Advances

학습 내용

Joint Plot

Pair Plot

Facet Grid 사용하기

catplot

displot

relplot

lmplot

참고 자료

Share on

Leave a comment

You may also enjoy

[Python] Effective Python CH 2. 리스트와 딕셔너리 - 1

[Python] Effective Python CH 1. 파이썬답게 생각하기 - 2

[Python] Effective Python CH 1. 파이썬답게 생각하기 - 1

[Python] Effective Python 전체 목차