Chapter 8: Dictionaries II

8.1.1 The Dictionary ADT
Function
Input
Output
Description
size()
-
Integer
Return the number of items in D.
isEmpty()
-
Boolean
Test whether D is empty.
elements()
-
Iterator of objects (elements)
Returns the elements stored in D.
keys()
-
Iterator of objects (keys) Returns the keys stored in D.
find(k)
Object (key) Position
If D contain an item with key equal to k, then return the position of such an item. If not, a null position is returned.
findAll(k)
Object (key) Iterator of Positions Return an iterator of positions for all items whose key equals k.
insertItem(k,e)
Objects k (key) and e (element) -
Insert an item with element e and key k into D.
removeElement(k)
Object (key)
-
Remove an item with key equal to k from D. An error condition occurs if D has no such item.
removeAllElements(k)
Object (key) -
Remove the items with key equal to k from D.


8.2 Hash Tables

8.2.1 Bucket Arrays

Data Structures and Algorithms in C++
Analysis of the Bucket Array Structure
8.2.2 Hash Functions

  • We design a hash table for a dictionary storing items (SSN, Name), where SSN (social security number) is a nine-digit positive integer
  • Our hash table uses an array of size N = 10 000 and the hash function h(x) = last four digits of x
  • To avoid any collision, we have to use N = 10 000 000 000 and the hash function h(x) = x (Drawback 1)

8.2.3 Hash Codes
Hash Codes in C++
A Small C++ Example
32-bit integer if we have 32-bit integer hash function
int hashCode(int x)
{ return x; }
64-bit integer if we have 32-bit integer hash function
int hashCode(long x)
{ typedef unsigned long ulong;
return hashCode(static_cast<int>(static_cast<ulong>(x) >> 32)
+ static_cast<int>(x));
}
Polynomial Hash Codes
Cyclic Shift Hash Codes
int hashCode(const char* p, int len) // hash a character array
{ unsigned int h = 0;
for (int i = 0; i < len; i++)
{ h = (h << 5)|(h >> 27); // 5-bit cyclic shift
h += (unsigned int)p[i]; // add in next character
}
return hashCode(int(h));
}
Experimental Results
25000 English words
Shift
Collisions Total
Collisions Max
0
23739
86
1
10517
21
5
4
2
6
6
2
11
453
4

Hashing Floating-Point Quantities
int hashCode(const double& x)       // hash a double
{ int len = sizeof(x);
const char* p = reinterpret_cast<const char *>(&x);
return hashCode(p, len);
}
C++ provides an operation called a reinterpret_cast, to cast between such unrelated types.
This cast treats quantities as a sequence of bits and makes no attempt to intelligently convert the meaning of one quantity to another.

hash_code.cpp

8.2.4 Compression Maps

The Division Method
The MAD Method
Multiply, Add and Divide (MAD):
8.2.5 Collision-Handling Schemes
Separate Chaining

Open Addressing Approach
Open addressing: the colliding item is placed in a different cell of the table

Linear Probing

  • h(x) = x mod 13
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

Search with Linear Probing
  • Consider a hash table A that uses linear probing
  • find(k)
    • We start at cell h(k)
    • We probe consecutive locations until one of the following occurs:
      • An item with key k is found, or
      • An empty cell is found, or
      • N cells have been unsuccessfully probed
Algorithm find(k)
   i h(k)
   p ← 0
   repeat
      cA[i]
      if c = ∅
          return Position(null)
      else if c.key() = k
          return Position(c)
      else
         i ← (i + 1) mod N
         pp + 1
   until p = N
   return Position(null)

Updates with Linear Probing
Double Hashing

  • Consider a hash table storing integer keys that handles collision with double hashing
    • N = 13
    • h(k) = k mod 13
    • d(k) = 7 − k mod 7
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

8.2.7 A C++ Hash Table Implementation

html-8.1 (HashEntry)
html-8.2
(Position)
html-8.3
(Hash1)
html-8.
4 (Hash2)

hash.cpp