What are Different Types of Clustering Algorithm?. Clustering algorithms come in a variety of shapes and sizes. Because there are probably over 100 documented clustering methods, the following overview will only cover the most well-known examples. Because not all offer models for their clusters, they are difficult to classify.
Methods based on distribution
It’s a clustering model in which we fit the data based on the likelihood that it’s from the same distribution. Normal or gaussian grouping can be used. The Gaussian distribution is more apparent when we have a set number of distributions and all future data is fitted into them in order to optimise the distribution of data. This results in grouping, as illustrated in the diagram: This approach works well with synthetic data and clusters of various sizes. However, if the constraints are not applied to restrict the model’s complexity, this model may have issues. Furthermore, clusters created using Distribution-based clustering presuppose succinctly specified mathematical models underlying the data, which is a significant assumption for particular data distributions.
One prominent example of this technique is the expectation-maximization algorithm, which employs multivariate normal distributions.
Methods based on the centroid
This is an iterative clustering technique in which clusters are generated by the distance between data points and the cluster’s centroid. The cluster centre, or centroid, is constructed in such a way that the distance between data points and the centre is as little as possible. Because this is really an NP-Hard issue, solutions are frequently approximated across a number of attempts.
For example, the K – means algorithm is a well-known example of this method.
The most significant flaw in this approach is that we must define K in advance. Clustering density-based distributions is also an issue.
Methods based on connectivity
The basic concept of a connectivity-based model is similar to that of a centroid-based model, which defines clusters based on the proximity of data points. We work on the assumption that data points that are closer to each other behave similarly to data points that are farther apart. It provides a large hierarchy of clusters that merge with one other at specific distances, rather than a single split of the data set. The choice of distance function is purely subjective in this case. These models are simple to understand, but they lack scalability. Density Models: In this clustering model, the data space will be searched for regions with different densities of data points. It separates distinct density areas in the data space depending on varying densities.
DBSCAN and OPTICS are two examples.
Clustering of subspaces | Clustering Algorithm
Subspace clustering is an unsupervised learning task that tries to arrange data points into numerous clusters so that data points in each cluster reside on a low-dimensional linear subspace in a similar way.Subspace clustering, like feature selection, is a subset of feature selection. Subspace clustering necessitates a search technique and assessment criteria, but it also restricts the breadth of those criteria. The subspace clustering technique narrows down the search for important dimensions, allowing them to locate clusters that exist across several overlapping subspaces.
Subspace clustering was created to tackle a very particular computer vision problem involving the union of subspace structures in the data, but it is gaining popularity in the statistics and machine learning communities. This technique is used in social networks, movie suggestions, and biological datasets. Because many of these applications deal with sensitive data, subspace clustering raises concerns about data privacy. Because it only protects the differential privacy of any characteristic of a user rather than the full profile user of the database, data points are presumed to be incoherent.
Based on their search technique, there are two types of subspace clustering.
- Top-down methods find an initial clustering in all of the dimensions and assess each cluster’s subspace.
- The bottom-up method identifies dense regions in low-dimensional space, which are subsequently combined to form clusters.