2. Dev Learning

Bloom Filter

From Big Data notes, Bloom Filter:

  • Keep an array  of bits
  • With  hash functions, do 
  • For the membership test,  if
  • probability that  is erroneously claimed to be in  is:

typically is a small constant which depends on the desired false error rate is proportional to

Insertion:

  • feed the element to the hash functions to get array positions
  • set positions bits to 1 Testing:
  • feed the element to the hash functions to get array positions
  • two cases
    • any of the position is at 0 = element def not in the set
    • otherwise = either:
      • the element is in the set
      • bits have by chance been set to 1 during insertion of other elements (no way to tell the difference)

This is the simple Bloom Filter.