`caret`

This notebook describes an example of using the `caret`

^{1} package to conduct hyperparameter tuning for the k-Nearest Neighbour classifier.

The example dataset is the `banknote`

dataframe found in the `mclust`

^{2} package. It contains six measurements made on 100 genuine and 100 counterfeit old-Swiss 1000-franc bank notes.

There are six predictor variables (`Length`

, `Left`

, `Right`

, `Bottom`

, `Top`

, `Diagonal`

) with `Status`

being the categorical response or class variable having two levels, namely `genuine`

and `counterfeit`

.

Observe that the dataset is balanced with 100 observations against each level of `Status`

.

```
banknote %>%
group_by(Status) %>%
summarise(N = n(),
Mean_Length = mean(Length),
Mean_Left = mean(Left),
Mean_Right = mean(Right),
Mean_Bottom = mean(Bottom),
Mean_Top = mean(Top),
Mean_Diagonal = mean(Diagonal),
.groups = "keep")
```

In most of the measurements of bank notes aside from `Length`

, genuine and counterfeit notes have quite distinct distributions.

```
library(tidyr)
banknote %>%
mutate(ID = 1:n()) %>%
pivot_longer(Length:Diagonal,
names_to = "Dimension",
values_to = "Size") %>%
mutate(Dimension = factor(Dimension),
ID = factor(ID)) %>%
ggplot() +
aes(y = Size, fill = Status) +
facet_wrap(~ Dimension, scales = "free") +
geom_boxplot() +
theme(axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(y = "Size (mm)", title = "Comparison of bank note dimensions")
```