Chameleon Features Categorical Variables Anomaly Detection Missing Values Prediction Feature Extraction Proprietary Algorithms Seventh Sense Software Chameleon Homepage Cluster Analysis Density Estimation Classification Visualisation

DENSITY ESTIMATION

[Continued...] Spatial binning methods simply partition the space into regular blocks and count the number of samples in each block. The population density estimate within each block is given by the number of samples per unit volume for the block.
Properties: Only discrete density estimates are obtained, but they are normalised. There is an unpleasant trade off between bin size and quality of density estimate, leaving the problem of how to find the optimal bin size. Very poor quality is obtained with small samples or when the number of variables is large. It is only really suitable for purely categorical data which is already naturally discrete.

In nearest-neighbour methods, the population density estimate for a test point is obtained by measuring the volume V of the ball containing its k nearest points. The associated density estimate is given by the ratio k/V.

Properties: The overall shapes of population distributions are generally modeled well. The density estimate tends to be good inside the clusters where sample points are plentiful, but overestimates in the tails as a result of which the overall distribution is non-integrable. Density estimates reveal sudden sharp fluctuations. Like kernel methods, small values of k tend to overfit the sample, while large values oversmooth, leaving the problem of selecting the optimal value for the parameter. [Continued...]