Jan-Philipp Kolb and Alexander Murray-Watters
18 Januar 2019
Animated example: https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Apply kmeans to to the iris
dataset with 2, 3, and 4 clusters. Produce three scatter plots, with the points colored according to cluster assignment.
A fairly new alternative to kmeans, hdbscan does not require you to specify the number of categories to be assigned. It only requires a decision as to the minimum number of points needed to be included in a cluster. This minimum number acts as a smoothing parameter (such as a density bandwidth parameter or a histograms bin/bar width), with lower values finding more clusters. Other advantages of hdbscan include .
– Be sure to try different numbers of centers.
ChickWeight
dataset’s “weight” “Time” variables, and see how well you can get each to perform.