Probabilistic Data Structure To Check Membership

image

False positives are possible. it might say, element might exist, even thought it isn’t.

Ref to https://systemdesign.one/bloom-filters-explained/

Requirements

Design a data structure with the following characteristics:

Insertion

There is a probability that some bits on the array are set to one multiple times due to hash collisions.

image

Membership test

image

False positive case

image

Bloom filter calculator

The hur.st bloom filter calculator can be used to choose an optimal size for the bloom filter and explore how the different parameters interact. The accuracy of the bloom filter depends on the following:

The properties of an optimal hash function for the bloom filter are the following:

Summary

image

Ref: https://ricardoanderegg.com/posts/bloom-filters-poster/

Applications

image

For example, one use case of Bloom filters is the following: you have a huge list of malicious URLs. In your browser, before a user navigates to a new URL, you want to check if it’s inside the list of dangerous URLs. You can use a bloom filter to do that! It will take less space than saving the full list of URLs, and if the answer from the Bloom filter is “no” (the URL is not a malicious one), you can safely let the user visit it

image

Reducing disk lookups for the non-existing keys.

Bloom filter disadvantages

The limitations of the bloom filter are the following:

Removing an item from the bloom filter is not supported because there is no possibility to identify the bits that should be cleared. There might be other items that map onto the same bits and clearing any of the bits would introduce the possibility of false negatives.

Extensions of bloom filter

The counting bloom filter includes an array of counters for each bit in the bloom filter array. The counting bloom filter supports the delete operation.

image

Counting Bloom filters are much bigger than classical Bloom filters because additional memory has to be allocated for the counters even if most of them are zeros.