Machine Learning

KNN Regression ( K-근접이웃 회귀 _ (K-nearest neighbors) ) 개념 및 python 예제

흰곰곰 2019. 10. 9. 21:06

KNN(K-nearest neighbors) Regression -> K-근접이웃 회귀

1. 정의

  • 가까운 점들을 기준으로 , 점들의 평균으로 예측하는 것

2. 작동방식

  • 13번 점을 예측하고자 할 경우, k=3이다.
    knn_regression_image_1
  • 선택된 점은 6, 5, 1번점이 선택되었고, 13Predict = (77+72+60)/3 = 69.66
    knn_regression_image_2

3. 거리 계산방법

1). Euclidean Distance: Euclidean distance is calculated as the square root of the sum of the squared differences between a new point (x) and an existing point (y).
2). Manhattan Distance : This is the distance between real vectors using the sum of their absolute difference.
knn_regression_image_3
3. Hamming Distance: It is used for categorical variables. If the value (x) and the value (y) are same, the distance D will be equal to 0 . Otherwise D=1.
knn_regression_image_4


4. Python Example

from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn import neighbors
from sklearn.metrics import mean_squared_error 
from math import sqrt
import pandas as pd

boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target ,test_size=0.2)
print(len(X_train), len(X_test))

rmse_val = [] #to store rmse values for different k
for K in range(20):
    K = K+1
    model = neighbors.KNeighborsRegressor(n_neighbors = K)
    model.fit(X_train, y_train)  #fit the model
    pred=model.predict(X_test) #make prediction on test set
    error = sqrt(mean_squared_error(y_test,pred)) #calculate rmse
    rmse_val.append(error) #store rmse values
    print('RMSE value for k= ' , K , 'is:', error)

knn_regression_image_5


#plotting the rmse values against k values
curve = pd.DataFrame(rmse_val) #elbow curve 
curve.plot()

knn_regression_image_6



참고 자료

https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/