Components

Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections] Date Created: INPUT[dateTime(defaultValue(null)):Date_Created] Due Date: INPUT[dateTime(defaultValue(null)):Due_Date] Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level] Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]

Description

Let $Γ = r_{ma x} / ρ$ where $r_{ma x}$ is the maximum radius for any cluster and $ρ = dist (q, x_{k})$ is the $k$ th true nearest neighbor distance. If the dataset $S$ belongs to a metric space of doubling dimension $D$ then, with probability $δ$ , we have that the query algorithm terminates with expected running time: $O (Γ^{D} \cdot (OPT (L, K, k, (1 - δ) / k) + L (K + k)))$

proof

The query algorithm defined in Algorithm \ref{} returns the true $k$ nearest neighbors with probability $δ$ , hence the time complexity result we derive holds with probability $δ$ . At iteration $t$ of the algorithm, denote by $x_{k}^{'}^{(t)}$ as the farthest point (in terms of distance to $q$ ) stored in the priority queue PQ and let

R^{(t)} = dist (q, x_{k}^{'}^{(t)}) + r_{i} \leq dist (q, x_{k}^{'}^{(t)}) + r_{ma x}

where $r_{i}$ is the radius of the cluster $i$ and $r_{ma x}$ is the maximum radius of all clusters. Initially, at $t = 0$ , PQ is empty so $x_{k}^{'}^{(0)} = \infty$ and $R^{(0)} = \infty$ . As the algorithm proceeds in the next iterations, $x_{k}^{'}^{(t)}$ decreases incrementally until it converges to the true $k$ -nearest neighbor distance $ρ = dist (q, x_{k})$ . To ensure all points within distance $ρ$ of $q$ are found, CLANN must examine all clusters that intersect the $ρ$ -ball centered at $q$ . For cluster $C_{i}$ to intersect the $ρ$ -ball it must hold that: $dist (q, c_{i}) - r_{i} \leq ρ$ Rearranging, we obtain:

dist (q, c_{i}) \leq ρ + r_{i} \leq ρ + r_{ma x}

Therefore, every cluster contributing to the final $k$ -nearest neighbors must lie within a ball of radius $R = ρ + r_{ma x}$ centered at $q$ .

Given that the dataset $S$ belongs to a metric space of doubling dimension $D$ , we can apply the doubling space property to bound the number of relevant clusters. This property allows us to recursively cover each ball of radius $R$ with $2^{D}$ balls of half the radius $R /2$ , then each of those with $2^{D}$ balls of radius $R /4$ and so on. At each scale $R / 2^{i}$ the number of clusters needed is $O (2^{i D})$ . Summing over all scales until $R / 2^{i} \geq ρ$ we get the total number of potentially relevant clusters:

M = i = 0 \sum ⌈ l o g_{2} (R / ρ)⌉ 2^{i D} = \frac{2 ^{D} ( R / ρ ) ^{D} - 1}{2 ^{D} - 1} = O (\frac{2 ^{D} ( R / ρ ) ^{D}}{2 ^{D}}) = O ((\frac{R}{ρ})^{D})

Define $Γ = \frac{r _{ma x}}{ρ}$ , then:

M = O ((\frac{R}{ρ})^{D}) = O ((\frac{ρ + r _{ma x}}{ρ})^{D}) = O ((1 + Γ)^{D})

when $Γ ≫ 1$ (i.e., when the maximum cluster radius is significantly larger than the distance to the $k$ -th nearest neighbor), this simplifies to $M = O (Γ^{D})$ . Bounded the number of relevant clusters, we can conclude the running time proof.

For each of the $M$ clusters, the algorithm performs two main operations:

A call to the PUFFINN search algorithm, which returns the candidate nearest neighbors in expected time $O (OPT (L, K, k, (1 - δ) / k) + L (K + k))$ with probability $δ$ . This time is linked to the optimal expected running time of an algorithm that knows optimal parameter choices for the number of tries and depth of each one.
At most $k$ priority queue insertions, where each insertion requires time $O (lo g k)$ for a total of $O (k lo g k)$ per cluster. Additionally, the initial sorting of the clusters is performed in time $O (∣ C ∣ lo g ∣ C ∣)$ . Thus, the overall expected running time is:

O (∣ C ∣ lo g ∣ C ∣ + Γ^{D} \cdot ((OPT (L, K, k, (1 - δ) / k) + L (K + k)) + k lo g k))

Under typical assumptions where the sorting and $k lo g k$ term are dominated by the PUFFINN search cost (which is typically true for large datasets), this expression simplifies asymptotically to:

O (Γ^{D} \cdot (OPT (L, K, k, (1 - δ) / k) + L (K + k)))

concluding the proof.

🌱 Enrico's Digital Garden

Explorer

final

Components

Description

proof

Graph View

Table of Contents

Backlinks