caret
This notebook describes an example of using the caret
1 package to conduct hyperparameter tuning for the k-Nearest Neighbour classifier.
The example dataset is the banknote
dataframe found in the mclust
2 package. It contains six measurements made on 100 genuine and 100 counterfeit old-Swiss 1000-franc bank notes.
There are six predictor variables (Length
, Left
, Right
, Bottom
, Top
, Diagonal
) with Status
being the categorical response or class variable having two levels, namely genuine
and counterfeit
.
Observe that the dataset is balanced with 100 observations against each level of Status
.
banknote %>%
group_by(Status) %>%
summarise(N = n(),
Mean_Length = mean(Length),
Mean_Left = mean(Left),
Mean_Right = mean(Right),
Mean_Bottom = mean(Bottom),
Mean_Top = mean(Top),
Mean_Diagonal = mean(Diagonal),
.groups = "keep")
In most of the measurements of bank notes aside from Length
, genuine and counterfeit notes have quite distinct distributions.
library(tidyr)
banknote %>%
mutate(ID = 1:n()) %>%
pivot_longer(Length:Diagonal,
names_to = "Dimension",
values_to = "Size") %>%
mutate(Dimension = factor(Dimension),
ID = factor(ID)) %>%
ggplot() +
aes(y = Size, fill = Status) +
facet_wrap(~ Dimension, scales = "free") +
geom_boxplot() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(y = "Size (mm)", title = "Comparison of bank note dimensions")