Components
Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections]
Date Created: INPUT[dateTime(defaultValue(null)):Date_Created]
Due Date: INPUT[dateTime(defaultValue(null)):Due_Date]
Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level]
Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]
Description
Let where is the maximum radius for any cluster and is the th true nearest neighbor distance. If the dataset belongs to a metric space of doubling dimension then, with probability , we have that the query algorithm terminates with expected running time:
proof
The query algorithm defined in Algorithm \ref{} returns the true nearest neighbors with probability , hence the time complexity result we derive holds with probability . At iteration of the algorithm, denote by as the farthest point (in terms of distance to ) stored in the priority queue PQ and let
where is the radius of the cluster and is the maximum radius of all clusters. Initially, at , PQ is empty so and . As the algorithm proceeds in the next iterations, decreases incrementally until it converges to the true -nearest neighbor distance . To ensure all points within distance of are found, CLANN must examine all clusters that intersect the -ball centered at . For cluster to intersect the -ball it must hold that: Rearranging, we obtain:
Therefore, every cluster contributing to the final -nearest neighbors must lie within a ball of radius centered at .
Given that the dataset belongs to a metric space of doubling dimension , we can apply the doubling space property to bound the number of relevant clusters. This property allows us to recursively cover each ball of radius with balls of half the radius , then each of those with balls of radius and so on. At each scale the number of clusters needed is . Summing over all scales until we get the total number of potentially relevant clusters:
Define , then:
when (i.e., when the maximum cluster radius is significantly larger than the distance to the -th nearest neighbor), this simplifies to . Bounded the number of relevant clusters, we can conclude the running time proof.
For each of the clusters, the algorithm performs two main operations:
- A call to the PUFFINN search algorithm, which returns the candidate nearest neighbors in expected time with probability . This time is linked to the optimal expected running time of an algorithm that knows optimal parameter choices for the number of tries and depth of each one.
- At most priority queue insertions, where each insertion requires time for a total of per cluster. Additionally, the initial sorting of the clusters is performed in time . Thus, the overall expected running time is:
Under typical assumptions where the sorting and term are dominated by the PUFFINN search cost (which is typically true for large datasets), this expression simplifies asymptotically to:
concluding the proof.