Cite

[1] K. Beyer, J. Goldstein, R. Ramakrishnan, e U. Shaft, «When Is “Nearest Neighbor” Meaningful?», in Database Theory — ICDT’99, C. Beeri e P. Buneman, A c. di, Berlin, Heidelberg: Springer, 1999, pp. 217–235. doi: 10.1007/3-540-49257-7_15.

Synthesis

Contribution::

Strong Points::

Weak Points::

Related::

Abstract

We explore the effect of dimensionality on the “nearest neighbor” problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10–15 dimensions. .

Notes

If

then for every :

Basically, if the condition holds (and it holds quite frequently) all points converge to the same distance to the query point

The condition holds for:

  • iid dimensions with query and data independence
  • unique dimensions with correlation between all dimensions
  • variance converging to 0
  • marginal data and query distributions change with dimensionality

do not hold for:

  • identical dimensions with no independence points fall on a line.

Annotations