Cluster analysis is a commonly used technique (or set of techniques) for identifying structure in data when such structure is unknown a priori.
More specifically, cluster analysis is the classification of sets of multivariate data into groups or clusters of similar samples. Most standard clustering methods fall into one of two categories, namely (i) partitional methods, and (ii) hierarchical methods.
In
partitional clustering, every data sample
is initially assigned to a cluster in some (possibly random) way. Samples
are then iteratively transferred from cluster to cluster until some criterion
function is minimised. Once the process is complete, the samples will have
been partitioned into separate compact clusters. Examples of partitional
clustering methods are k-means and Lloyd's method.
In
hierarchical clustering, each sample
is initially considered a member of its own cluster, after which clusters
are recursively combined in pairs according to some predetermined condition
until eventually every point belongs to a single cluster. The resulting
hierarchical structure may be represented by a binary tree or "dendogram",
from which the desired clusters may be extracted. Examples of hierarchical
clustering methods are the single-link, Ward's, centroid, complete-link,
group average, median, and parametric Lance Williams methods. [Continued...]