KNN - K Nearest Neighbors

KNN- K nearest neighbors It simply assigns a label to new data based on the distance between the old data and new data.

KNN 概念：物以類聚, 適用於離散型資料，也適用於連續型資料。原理：找到最近的K個鄰居→ 進行投票→決定類別計算鄰居與我們的距離→ 用K值決定鄰居數目，並進行投票 (在連續型資料中，則是計算平均數)

We want a K value that minimizes error: Error = 1 - Accuracy

Two methods:

Elbow method

Untitled

Cross validate a grid search of multiple K values and choose K that results in lowest error or highest accuracy.

Cross validation only takes into account the K value with the lowest error rate across multiple folds. This could result in a more complex model (higher value of K). Consider the context of the problem to decide if larger K values are an issue.

Cross Validation in Machine Learning - GeeksforGeeks

Distance Metric Minkowski Euclidean Manhattan Chebyshev

總結（1）k值選擇很關鍵，且最好避免選擇偶數（2）要不斷的切割樣本（交叉驗證）（3）選擇合適的計算方式

優點（1）簡單易懂（2）資料型態不受限（3）在多種類別預測有較好的表現

缺點

（1）計算成本高（2）資料不平衡時容易預測不準確

Reference