아가개발자/머신러닝
KNN classification - iris
율쟝
2020. 9. 18. 17:23
데이터셋 불러오기
# seaborn 라이브러리 사용
import seaborn as sns
iris = sns.load_dataset('iris')
X = iris.drop('species', axis = 1)
y = iris['species']
#sklearn 라이브러리 사용
from sklearn.datasets import load_iris
iris = load_iris()
더보기
<seaborn.load_dataset>
manual: seaborn.pydata.org/generated/seaborn.load_dataset.html?highlight=load_dataset#seaborn.load_dataset
seaborn.load_dataset — seaborn 0.11.0 documentation
If True, try to load from the local cache first, and save to the cache if a download is required.
seaborn.pydata.org
<sklearn.datasets.load_iris>
manual: scikit-learn.org/stable/modules/classes.html?highlight=datasets#module-sklearn.datasets
API Reference — scikit-learn 0.23.2 documentation
scikit-learn.org
카테고리의 실수화
class name이 실수가 아니기 때문에 실수로 바꿔 줌
from sklearn.preprocessing import LabelEncoder
import numpy as np
classle = LabelEncoder()
y = classle.fit_transform(iris['species'].values)
yo = classle.inverse_transform(y)
데이터 분할
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1, stratify = y)
학습
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5, p=2)
knn.fit(X_train, y_train)
y_train_pred = knn.predict(X_train)
y_test_pred = kbb.predict(X_test)
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_test_pred))