Components
Select Connection: INPUT[inlineListSuggester(optionQuery(#area)):connections]
Date Created: INPUT[dateTime(defaultValue(null)):Date_Created]
Due Date: INPUT[dateTime(defaultValue(null)):Due_Date]
Priority Level: INPUT[inlineSelect(option(1 Critical), option(2 High), option(3 Medium), option(4 Low)):Priority_Level]
Status: INPUT[inlineSelect(option(1 To Do), option(2 In Progress), option(3 Testing), option(4 Completed), option(5 Blocked)):Status]
Description
Let be the dataset, a query, the number of neighbors to find, the success probability (or expected recall), and a generic point of the dataset. The modified PUFFINN algorithm returns set PQ of at most points such that, with probability at least , it contains points such that: where is the true th nearest neighbor.
Proof
Let be the nearest neighbors of , with k. At each stage of the algorithm, the invariant ” or ” is satisfied even with the modification, due to the fact that the new probability stopping condition either accelerates termination or defaults to the original guarantee.
Let’s prove the original invariant by induction. base: is empty, so the condition holds induction: assume that the invariant holds before inserting a new point into PQ. We need to prove it still holds after the insertion. There are two cases:
- . After the insertions, either is still or it becomes exactly . In the latter case, we need to show that
- . If x is not inserted because , the invariant still holds. Otherwise, it replaces the previous , since the algorithm replaces the point with highest distance. We need to show that new satisfies We still have to prove that for any new . By construction, it holds that for any point , . Since is the true k nearest neighbor, we have that:
From the definition and the monotonicity of the LSH Family, we can conclude that implies , thus concluding the proof of the invariant.
Now we want to find the minimum number of tries to visit before returning a set of at most points each one with probability at least to be among the true k nearest neighbors or within .
Two cases may occur:
- (or ), this case reverts to the original PUFFINN guarantee since . Suppose we are at level of the search and the probability of collision at this level is . If we perform independent searches at level , the probability that none of these candidates collide with is . Our aim is to find that satisfies:
At this point we can use the Taylor expansion , on the dividend, to obtain:
since we proved that implies , we can conclude the proof:
- (or )). The proof repeats as before, with instead of . Obtaining that with: we can guarantee, with probability that all with (i.e ) are found.