Chapter 8: Dictionaries

Like a priority queue, a dictionary is a container of key-element pairs. Nevertheless, although a total order relation on the keys is always required for a priority queue, it is optional for a dictionary. Indeed, the simplest form of a dictionary assumes only that we can determine whether two keys are equal. When the total order relation on the keys is defined, then we can talk about an ordered dictionary, and we specify additional ADT functions that refer to the ordering of the keys.

8.1 The Dictionary Abstract Data Type

A dictionary ADT stores key-element pairs (k,e) which we call items, where k is the key and e is the element.
In an unordered dictionary we can use an equality tester object to test whether two keys, k1 and k2, are equal with function
isEqualTo(k1, k2).

8.1.1 The Dictionary ADT

HashTables.pdf 2
As an ADT, a dictionary D supports the following functions:

Function
Input
Output
Description
size()
-
Integer
Return the number of items in D.
isEmpty()
-
Boolean
Test whether D is empty.
elements()

Iterator of objects (elements)
Returns the elements stored in D.
keys()

Iterator of objects (keys) Returns the keys stored in D.
find(k)
Object (key) Position
If D contain an item with key equal to k, then return the position of such an item. If not, a null position is returned.
findAll(k)
Object (key) Iterator of Positions Return an iterator of positions for all items whose key equals k.
insertItem(k,e)
Objects k (key) and e (element) -
Insert an item with element e and key k into D.
removeElement(k)
Object (key)
-
Remove an item with key equal to k from D. An error condition occurs if D has no such item.
removeAllElements(k)
Object (key) -
Remove the items with key equal to k from D.

Remarks: The way the items of a dictionary are stored is implementation dependent. The notation p(x) indicates the position of the item storing element x.

Operation
Output
Dictionary
insertItem(5,A)
insertItem(7,B)
insertItem(2,C)
insertItem(8,D)
insertItem(2,E)
find(7)
find(4)
find(2)
findAll(2)
size()
removeElement(5)
removeElement(5)
removeAllElements(2)
find(2)
findAll(2)
-
-
-
-
-
p(B)
"null"
p(C) or p(E)
p(C),p(E)
5
-
"error"
-
"null"
"empty iterator"
{(5,A)}
{(5,A),(7,B)}
{(5,A),(7,B),(2,C)}
{(5,A),(7,B),(2,C),(8,D)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(5,A),(7,B),(2,C),(8,D),(2,E)}
{(7,B),(2,C),(8,D),(2,E)}
{(7,B),(2,C),(8,D),(2,E)}
{(7,B),(8,D)}
{(7,B),(8,D)}
{(7,B),(8,D)}

Position class provides:

Operation
Input
Output
Description
element() -
Object (element) Return a reference to the element of the associated item.
key() -
Object (key) Return a constant reference to the key of the associated item.
isNull() -
Boolean
Determine if this is a null position.

8.1.2 Log Files

A simple way of realizing a dictionary is to use an unordered vector, list, or general sequence to store the key-element pairs. Such an implementation is called a log file.
HashTables.pdf 3

Unordered Sequence Implementation
HashTables.pdf 3
The sequence S used for the log file is implemented either a vector or a doubly linked list.

Analysis of the Log File Data Structure
HashTables.pdf 3

Applications for Log Files
HashTables.pdf 3

8.2 Hash Tables

One of the most efficient ways to implement a dictionary is to use a hash table. Although hash tables have high worst-case running times for dictionary ADT operations, we will see that their expected-case running time are excellent. Letting n denote the number of items, the worst-case running times are O(n), but the expected-case times are only O(1).

8.2.1 Bucket Arrays

A bucked array for a hash table is an array A of size N, where each cell of A is thought of as a "bucket" (that is, a container of key-element pairs) and the integer N denotes the capacity of the array. If the keys are integers well distributed in the range [0,N-1], this bucket array is all that is needed - an element e with a key k is simply inserted into the bucket A[k].
If keys are not unique, then two different elements may be mapped to the same bucket in A. In this case, we say that a collision has occurs.

Analysis of the Bucket Array Structure

8.2.2 Hash Functions

HashTables.pdf 4-6
The hash function is "good" if it maps the keys in out dictionary to minimize collisions as much as possible.
Also it should be fast and easy to compute.

8.2.3 Hash Codes

The integer assigned to a key k is called the hash code or hash value for k.

Hash Codes in C++
HashTables.pdf 7

Casting to an Integer
Take an integer interpretation of data type X bits as a hash code for X.
HashTables.pdf 7

Summing Components
HashTables.pdf 7

A Small C++ Example
64-bit integer if we have 32-bit integer hash function
int hashCode(int x)
{ return x; }

int hashCode(long x)
{ typedef unsigned long ulong;
return hashCode(int(ulong(x)>>32)+int(x));
}

int hashCode(long x)
{ typedef unsigned long ulong;
return hashCode(static_cast<int>(static_cast<ulong>(x) >> 32)
+ static_cast<int>(x));
}
Polynomial Hash Codes
HashTables.pdf 8

Cyclic Shift Hash Codes

int hashCode(const char* p, int len) // hash a character array
{ unsigned int h = 0;
for (int i=0; i<len; i++)
{ h = (h<<5)|(h>>27); // 5-bit cyclic shift
h += (unsigned int)p[i]; // add in next character
}
return hashCode(int(h));
}
Experimental Results
25000 English words
Shift
Collisions Total
Collisions Max
0
23739
86
1
10517
21
5
4
2
6
6
2
11
453
4

Hashing Floating-Point Quantities

int hashCode(const double& x)       // hash a double
{ int len = sizeof(x);
const char* p = reinterpret_cast<const char *>(&x);
return hashCode(p, len);
}

8.2.4 Compression Maps

The Division Method
HashTables.pdf 9

The MAD Method
HashTables.pdf 9

8.2.5 Collision-Handling Schemes

HashTables.pdf
10

Separate Chaining

Open Addressing

Linear Probing
HashTables.pdf
11-13

Quadratic Probing

Double Hashing
HashTables.pdf
14-15

8.2.6 Load Factors and Rehashing

Rehashing into a New Table
HashTables.pdf 16

8.2.7 A C++ Hash Table Implementation

html-8.1 (HashEntry)
html-8.2 (Position)
html-8.3 (Hash1)
html-8.4 (Hash2)

hash.cpp


8.3 Ordered Dictionaries

In an ordered dictionary, we wish to perform the usual dictionary operations, but also maintain an order relation for the keys in our dictionary.

8.3.1 The Ordered Dictionary ADT

An ordered dictionary supports the following functions beyond those included in the general dictionary ADT (8.1.1):

8.3.2 Look-Up Tables

Dictionary.pdf 6

8.3.3 Binary Search

Dictionary.pdf 5

bsearch.cpp

Analysis of Binary Search
The running time is proportional to the number m of recursive calls. The number of remaining candidates is reduced by at least one half with each recursive call. In the worst case (unsuccessful search), the recursive call stops when there are no more candidates, i.e. n/2m = 1, m = log n and we obtain O(log n) running time.

Using Look-Up Tables as Ordered Dictionaries
Dictionary.pdf 6

Comparing Simple Ordered Dictionary Implementations

Function
Log File
Look-Up Table
size(), isEmpty()
O(1)
O(1)
keys(), elements()
O(n)
O(n)
find(key)
O(n)
O(log n)
findAll(key)
Theta(n)
O(log n + s)
insertItem(key, element)
O(1)
O(n)
removeElement(key)
O(n)
O(n)
removeAllElements()
Theta(n) O(n)