Cite

[1] M. Aumüller, F. Boninsegna, e F. Silvestri, «A Simple Linear Space Data Structure for ANN with Application in Differential Privacy», 11 settembre 2024, arXiv: arXiv:2409.07187. doi: 10.48550/arXiv.2409.07187.

Synthesis

Contribution::

Strong Points::

Weak Points::

Related::

Metadata

FirstAuthor:: Aumüller, Martin
Author:: Boninsegna, Fabrizio
Author:: Silvestri, Francesco
~
Title:: A Simple Linear Space Data Structure for ANN with Application in Differential Privacy
Year:: 2024
Citekey:: aumullerSimpleLinearSpace2024
itemType:: preprint
DOI:: 10.48550/arXiv.2409.07187

LINK

Preprint PDF.

Abstract

Locality Sensitive Filters are known for offering a quasi-linear space data structure with rigorous guarantees for the Approximate Near Neighbor search problem. Building on Locality Sensitive Filters, we derive a simple data structure for the Approximate Near Neighbor Counting problem under differential privacy. Moreover, we provide a simple analysis leveraging a connection with concomitant statistics and extreme value theory. Our approach achieves the same performance as the recent findings of Andoni et al. (NeurIPS 2023) but with a more straightforward method. As a side result, the paper provides a more compact description and analysis of Locality Sensitive Filters for Approximate Near Neighbor Search under inner product similarity, improving a previous result in Aum”{u}ller et al. (TODS 2022). .

Notes

La struttura dati è una hash table, con chiave l’indice i di $a_{i}$ che produce il massimo prodotto vettoriale tra $a_{i}$ e $x$ dove:

$a_{i}$ è un vettore random estratto dalla Gaussiana (se ne estraggono m, che sono anche il numero di chiavi della hash table)
$x$ è il vettore di input

I bucket sono definiti da:

${i \in [m] : ⟨ a_{i}, q ⟩ \geq η}$

dunque tutte le chiavi della hash table tale che il prodotto vettoriale tra il vettore random e la query sia maggiore di quel fattore.

Intuizione del perché funziona:

teorema 2, se associamo il vettore Gaussiano più vicino $a_{x} = ar g max ⟨ a, x ⟩$ troveremo x associato ad un vettore Gaussiano con prodotto interno $⟨ q, a_{x} ⟩ \sim δ 2 lo g m$

LSH suffers from large space requirements.

requires $O (n d + n^{1 + p})$ memory words

Tensoring

to reduce:

pre-processing time
space
expected query time

create $t$ independent data structures $D_{1}, \dots, D_{t}$ each using $m^{1/ t}$ Gaussian vectors $a_{i, j}$ .

Search in all buckets.

🌱 Enrico's Digital Garden

Explorer

@aumullerSimpleLinearSpace2024

Notes

Tensoring

Annotations

Graph View

Table of Contents