2. CLANN

Overview

On [[@ceccarelloSolvingcenterClustering2021]] a technique to bound the size of the clusters using the doubling dimension property is used. This is a rewrite and extension of the proof presented in the paper.

THEOREM

If belongs to a metric space of doubling dimension , then:

Basic Idea

  • prove how many iterations are needed to reach the stopping condition of GMM
  • since the algorithm adds one point at each iteration, we have therefore bounded the number of points Formally, we have to find an upper bound on the number of iterations of GMM needed to obtain

Proof

For each subset ,the k-centers produce a clustering with radius . Using the doubling dimension property , we have that each of the k clusters in can be covered with at most balls of radius . After running iterations of gmm, the algorithm guarantees that any two points in the set 1 are at least apart. This is because the greedy algorithm selects the farthest points each time. Given there are only balls covering , by the Pigeonhole Principle at least two points from must fall into the same ball.

The triangle inequality tells us that if two points are within the same ball, the distance between them is at most twice the radius of that ball. Since each ball has radius , the maximum distance between any two points within the same ball is:

proving that with at most we get to the termination criterion. So, since there are subsets and at each iteration the algorithm adds one point, we can say that:

thus concluding the proof.

Footnotes

  1. : is the farthest point from any center in