[Continued...]
Spatial binning methods simply
partition the space into regular blocks and count the number of samples
in each block. The population density estimate within each block is given
by the number of samples per unit volume for the block.
Properties: Only discrete density estimates
are obtained, but they are normalised. There is an unpleasant trade off
between bin size and quality of density estimate, leaving the problem of
how to find the optimal bin size. Very poor quality is obtained with small
samples or when the number of variables is large. It is only really suitable
for purely categorical data which is already naturally discrete.
In
nearest-neighbour methods,
the population density estimate for a test point is obtained by measuring
the volume V of the ball containing its k nearest points. The associated
density estimate is given by the ratio k/V.
Properties: The overall shapes of population distributions are generally modeled well. The density estimate tends to be good inside the clusters where sample points are plentiful, but overestimates in the tails as a result of which the overall distribution is non-integrable. Density estimates reveal sudden sharp fluctuations. Like kernel methods, small values of k tend to overfit the sample, while large values oversmooth, leaving the problem of selecting the optimal value for the parameter. [Continued...]