Chapter 8: Dictionaries

8.1 The Dictionary Abstract Data Type

8.1.1 The Dictionary ADT
Function
Input
Output
Description
size()
-
Integer
Return the number of items in D.
isEmpty()
-
Boolean
Test whether D is empty.
elements()
-
Iterator of objects (elements)
Returns the elements stored in D.
keys()
-
Iterator of objects (keys) Returns the keys stored in D.
find(k)
Object (key) Position
If D contain an item with key equal to k, then return the position of such an item. If not, a null position is returned.
findAll(k)
Object (key) Iterator of Positions Return an iterator of positions for all items whose key equals k.
insertItem(k,e)
Objects k (key) and e (element) -
Insert an item with element e and key k into D.
removeElement(k)
Object (key)
-
Remove an item with key equal to k from D. An error condition occurs if D has no such item.
removeAllElements(k)
Object (key) -
Remove the items with key equal to k from D.
Operation
Output
Dictionary
insertItem(5,A)
insertItem(7,B)
insertItem(2,C)
insertItem(8,D)
insertItem(2,E)
find(7)
find(4)
find(2)
findAll(2)
size()
removeElement(5)
removeElement(5)
removeAllElements(2)
find(2)
findAll(2)
-
-
-
-
-
p(B)
"null"
p(C) or p(E)
p(C),p(E)
5
-
"error"
-
"null"
"empty iterator"
{(5,A)}
{(5,A),(7,B)}
{(5,A),(7,B),(2,C)}
{(5,A),(7,B),(2,C),(8,D)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(7,B),(2,C),(8,D),(2,E)}
{(7,B),(2,C),(8,D),(2,E)}
{(7,B),(8,D)}
{(7,B),(8,D)}
{(7,B),(8,D)}
Operation
Input
Output
Description
element() -
Object (element) Return a reference to the element of the associated item.
key() -
Object (key) Return a constant reference to the key of the associated item.
isNull() -
Boolean
Determine if this is a null position.

8.1.2 Log Files

8.2 Hash Tables

8.2.1 Bucket Arrays

Data Structures and Algorithms in C++
Analysis of the Bucket Array Structure
8.2.2 Hash Functions

  • We design a hash table for a dictionary storing items (SSN, Name), where SSN (social security number) is a nine-digit positive integer
  • Our hash table uses an array of size N = 10 000 and the hash function h(x) = last four digits of x
  • To avoid any collision, we have to use N = 10 000 000 000 and the hash function h(x) = x (Drawback 1)

8.2.3 Hash Codes
Hash Codes in C++
A Small C++ Example
32-bit integer if we have 32-bit integer hash function
int hashCode(int x)
{ return x; }
64-bit integer if we have 32-bit integer hash function
int hashCode(long x)
{ typedef unsigned long ulong;
return hashCode(static_cast<int>(static_cast<ulong>(x) >> 32)
+ static_cast<int>(x));
}
Polynomial Hash Codes
Cyclic Shift Hash Codes
int hashCode(const char* p, int len) // hash a character array
{ unsigned int h = 0;
for (int i = 0; i < len; i++)
{ h = (h << 5)|(h >> 27); // 5-bit cyclic shift
h += (unsigned int)p[i]; // add in next character
}
return hashCode(int(h));
}
Experimental Results
25000 English words
Shift
Collisions Total
Collisions Max
0
23739
86
1
10517
21
5
4
2
6
6
2
11
453
4

Hashing Floating-Point Quantities
int hashCode(const double& x)       // hash a double
{ int len = sizeof(x);
const char* p = reinterpret_cast<const char *>(&x);
return hashCode(p, len);
}
C++ provides an operation called a reinterpret_cast, to cast between such unrelated types.
This cast treats quantities as a sequence of bits and makes no attempt to intelligently convert the meaning of one quantity to another.

hash_code.cpp

8.2.4 Compression Maps

The Division Method
The MAD Method
Multiply, Add and Divide (MAD):
8.2.5 Collision-Handling Schemes
Separate Chaining

Open Addressing Approach
Open addressing: the colliding item is placed in a different cell of the table

Linear Probing

  • h(x) = x mod 13
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

Search with Linear Probing
  • Consider a hash table A that uses linear probing
  • find(k)
    • We start at cell h(k)
    • We probe consecutive locations until one of the following occurs:
      • An item with key k is found, or
      • An empty cell is found, or
      • N cells have been unsuccessfully probed
Algorithm find(k)
   i h(k)
   p ← 0
   repeat
      cA[i]
      if c = ∅
          return Position(null)
      else if c.key() = k
          return Position(c)
      else
         i ← (i + 1) mod N
         pp + 1
   until p = N
   return Position(null)

Updates with Linear Probing
Double Hashing

  • Consider a hash table storing integer keys that handles collision with double hashing
    • N = 13
    • h(k) = k mod 13
    • d(k) = 7 − k mod 7
  • Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

8.2.7 A C++ Hash Table Implementation

html-8.1 (HashEntry)
html-8.2
(Position)
html-8.3
(Hash1)
html-8.
4 (Hash2)

hash.cpp


8.3 Ordered Dictionaries

In an ordered dictionary, we wish to perform the usual dictionary operations, but also maintain an order relation for the keys in our dictionary.

8.3.1 The Ordered Dictionary ADT

An ordered dictionary supports the following functions beyond those included in the general dictionary ADT (8.1.1):
8.3.2 Look-Up Tables
8.3.3 Binary Search

bsearch.cpp

Analysis of Binary Search

Comparing Simple Ordered Dictionary Implementations

Function
Log File
Look-Up Table
size(), isEmpty()
O(1)
O(1)
keys(), elements()
O(n)
O(n)
find(key)
O(n)
O(log n)
findAll(key)
Theta(n)
O(log n + s)
insertItem(key, element)
O(1)
O(n)
removeElement(key)
O(n)
O(n)
removeAllElements()
Theta(n) O(n)