DBSCAN (Week 7 : Monday) – Curves & Confidence: A Math Stats Explorer's Log

I learnt how and when to implement DBSCAN (Denisty-Based Spatial Clustering for Applications with Noise) today. It is a clustering algorithm used to identify clusters of data points in a space based on their density. It doesn’t really need us to specify the number of clusters beforehand like in k-means. It can discover clusters of aribitrary shapes

A cluster, according to DBSCAN is a dense region of data points separated by sparser regions. It classifies points into three categories (core, border and noise points).

A core point has a minimum number of neighboring points within a specified distance or epsilon. A border point has fewer neighbors than the min_samples but falls within the neighborhood of a core point. The rest are noise points and don’t belong to any cluster. Dbscan randomly selects a data point first and if it’s a core point, it starts a new cluster and puts all it’s neighbors that are reachable to this cluster. These neighbors can be core or border points. Repeat the same process for neighbors, adding their reachable neighbors to the cluster. Continue until no points can be added to the cluster. Now go to an unvisited point and repeat the whole thing. It being robust to outliers is one of it’s main advantage.

Leave a Reply Cancel reply