Cite

[1] M. Datar, N. Immorlica, P. Indyk, e V. S. Mirrokni, «Locality-sensitive hashing scheme based on p-stable distributions», in Proceedings of the twentieth annual symposium on Computational geometry, in SCG ’04. New York, NY, USA: Association for Computing Machinery, giu. 2004, pp. 253–262. doi: 10.1145/997817.997857.

Synthesis

Contribution::

Strong Points::

Weak Points::

Related::

Abstract

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain “bounded growth” condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree. .

Notes

LSH alternative definition

A family is called -sensitive for D (distance measure) if for any :

  • if then
  • if then

uses the p-stable distribution (in particular the dot product where is a random vector of dimension where each entry is chosen independently from a p-stable distribution) to assign the has value to each vector

The dot product projects each vector to the real line. Then chop the real line into equi-width segments and assign hash values to vectors based on which segment they project to.

The expression is the following:

where is a parameter that regulates the width of the buckets..

Annotations