In Hierarchical Clustering, you get clusters very very similar to K-means. In fact sometimes the result can be exactly the same as k-means clustering. The whole process is a bit different though. Two types, Agglomerative and Divisive. Agglomerative is the bottom up approach, Divisive is the opposite. I focused mainly on the Agglomerative approach today.
Step 1: Make each data point a single point cluster, forming N clusters.
Step 2: Take the two closest data points and make them one cluster. That forms N-1 clusters
Step 3: Then take two closest clusters and make them one cluster. That forms N-2 clusters
Step 4: Repeat 3 until there is only one huge cluster.
Step 5: Finish.
Closeness of clusters is different from closeness of data points where you can take measure it using techniques like the euclidean distance between the points.
I learnt about dendrograms, where the vertical axis tells you the Euclidean distance between point and the horizontal axis is for the data points. So the higher the lines, the more dissimilar the clusters are. We can set dissimilarity thresholds and the biggest clusters below the thresholds are what we need by looking at the number of lines the threshold cuts in our dendrogram.
In the dendrogram, intuitively, the largest vertical distance you can take without touching the horizontal lines is usually where the threshold lies.
I used the ward method to calculate the distance, this method merges two clusters and estimates its centroid and looks at the sum of the squared deviations of all the points from the new centroid. Different merge will have different deviations, it picks the merge with the smallest deviation from the new centroid.