GlyphSignal

Dynamic perfect hashing

Programming technique for resolving duplicate hash values in a hash table data structure

4 min read

In computer science, dynamic perfect hashing is a programming technique for resolving collisions in a hash table data structure. While more memory-intensive than its hash table counterparts, this technique is useful for situations where fast queries, insertions, and deletions must be made on a large set of elements.

Details

Static case

FKS Scheme

The problem of optimal static hashing was first solved in general by Fredman, Komlós and Szemerédi. In their 1984 paper, they detail a two-tiered hash table scheme in which each bucket of the (first-level) hash table corresponds to a separate second-level hash table. Keys are hashed twice—the first hash value maps to a certain bucket in the first-level hash table; the second hash value gives the position of that entry in that bucket's second-level hash table. The second-level table is guaranteed to be collision-free (i.e. perfect hashing) upon construction. Consequently, the look-up cost is guaranteed to be O(1) in the worst-case.

In the static case, we are given a set with a total of x entries, each one with a unique key, ahead of time. Fredman, Komlós and Szemerédi pick a first-level hash table with size s = 2 ( x 1 ) {\displaystyle s=2(x-1)} buckets.

To construct, x entries are separated into s buckets by the top-level hashing function, where s = 2 ( x 1 ) {\displaystyle s=2(x-1)} . Then for each bucket with k entries, a second-level table is allocated with k 2 {\displaystyle k^{2}} slots, and its hash function is selected at random from a universal hash function set so that it is collision-free (i.e. a perfect hash function) and stored alongside the hash table. If the hash function randomly selected creates a table with collisions, a new hash function is randomly selected until a collision-free table can be guaranteed. Finally, with the collision-free hash, the k entries are hashed into the second-level table.

The quadratic size of the k 2 {\displaystyle k^{2}} space ensures that randomly creating a table with collisions is infrequent and independent of the size of k, providing linear amortized construction time. Although each second-level table requires quadratic space, if the keys inserted into the first-level hash table are uniformly distributed, the structure as a whole occupies expected O ( n ) {\displaystyle O(n)} space, since bucket sizes are small with high probability.

The first-level hash function is specifically chosen so that, for the specific set of x unique key values, the total space T used by all the second-level hash tables has expected O ( n ) {\displaystyle O(n)} space, and more specifically T < s + 4 x {\displaystyle T<s+4\cdot x} . Fredman, Komlós and Szemerédi showed that given a universal hashing family of hash functions, at least half of those functions have that property.

Dynamic case

Dietzfelbinger et al. present a dynamic dictionary algorithm that, when a set of n items is incrementally added to the dictionary, membership queries always run in constant time and therefore O ( 1 ) {\displaystyle O(1)} worst-case time, the total storage required is O ( n ) {\displaystyle O(n)} (linear), and O ( 1 ) {\displaystyle O(1)} expected amortized insertion and deletion time (amortized constant time).

In the dynamic case, when a key is inserted into the hash table, if its entry in its respective subtable is occupied, then a collision is said to occur and the subtable is rebuilt based on its new total entry count and randomly selected hash function. Because the load factor of the second-level table is kept low 1 / k {\displaystyle 1/k} , rebuilding is infrequent, and the amortized expected cost of insertions is O ( 1 ) {\displaystyle O(1)} . Similarly, the amortized expected cost of deletions is O ( 1 ) {\displaystyle O(1)} .

Read full article on Wikipedia →

Content sourced from Wikipedia under CC BY-SA 4.0

Share

Keep Reading

2026-02-24
2
Robert Reed Carradine was an American actor. A member of the Carradine family, he made his first app…
1,253,437 views
4
Nemesio Rubén Oseguera Cervantes, commonly referred to by his alias El Mencho, was a Mexican drug lo…
453,625 views
5
David Carradine was an American actor, director, and producer, whose career included over 200 major …
381,767 views
6
Keith Ian Carradine is an American actor. In film, he is known for his roles as Tom Frank in Robert …
339,326 views
7
.xxx is a sponsored top-level domain (sTLD) intended as a voluntary option for pornographic sites on…
290,593 views
8
Ever Carradine is an American actress. She is known for her roles as Tiffany Porter and Kelly Ludlow…
289,538 views
Continue reading: