Cite

[1] M. Datar, N. Immorlica, P. Indyk, e V. S. Mirrokni, «Locality-sensitive hashing scheme based on p-stable distributions», in Proceedings of the twentieth annual symposium on Computational geometry, in SCG ’04. New York, NY, USA: Association for Computing Machinery, giu. 2004, pp. 253–262. doi: 10.1145/997817.997857.

Synthesis

Contribution::

Strong Points::

Weak Points::

Related::

Metadata

FirstAuthor:: Datar, Mayur
Author:: Immorlica, Nicole
Author:: Indyk, Piotr
Author:: Mirrokni, Vahab S.
~
Title:: Locality-sensitive hashing scheme based on p-stable distributions
Year:: 2004
Citekey:: datarLocalitysensitiveHashingScheme2004
itemType:: conferencePaper
Publisher:: Association for Computing Machinery
Location:: New York, NY, USA
Pages:: 253–262
DOI:: 10.1145/997817.997857
ISBN:: 978-1-58113-885-6

LINK

Datar et al. - 2004 - Locality-sensitive hashing scheme based on p-stabl.pdf.

Abstract

We present a novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions.Our scheme improves the running time of the earlier algorithm for the case of the lp norm. It also yields the first known provably efficient approximate NN algorithm for the case p<1. We also show that the algorithm finds the exact near neigbhor in O(log n) time for data satisfying certain “bounded growth” condition.Unlike earlier schemes, our LSH scheme works directly on points in the Euclidean space without embeddings. Consequently, the resulting query time bound is free of large factors and is simple and easy to implement. Our experiments (on synthetic data sets) show that the our data structure is up to 40 times faster than kd-tree. .

Notes

LSH alternative definition

A family $H = {h : S \to U}$ is called $(r_{1}, r_{2}, p_{1}, p_{2})$ -sensitive for D (distance measure) if for any $v, q \in S$ :

if $v \in B (q, r_{1})$ then $P_{H} [h (q) = h (v)] \geq p_{1}$
if $v \neq \in B (q, r_{2})$ then $P_{H} [h (q) = h (v)] \leq p_{2}$

uses the p-stable distribution (in particular the dot product $a \cdot v$ where $a$ is a random vector of dimension $d$ where each entry is chosen independently from a p-stable distribution) to assign the has value to each vector $v$

The dot product projects each vector to the real line. Then chop the real line into equi-width segments and assign hash values to vectors based on which segment they project to.

The expression is the following:

where $r$ is a parameter that regulates the width of the buckets..

🌱 Enrico's Digital Garden

Explorer

@datarLocalitysensitiveHashingScheme2004

Notes

LSH alternative definition

Annotations

Graph View

Table of Contents