2. Evaluation and complexity of algorithms [1.4]

Three main features of a computer algorithm:
-- simplicity and elegance;
-- correctness;
-- speed.

Let's look at the following snippet:

cin >> n;
sum = 0;
for (i=0; i<n; i++)
 for (j=0; j<n; j++) sum++;
 

* How fast will the above program work? what are the criteria by which its speed is determined?
* We can experimentally check how long the program will run.
* In order to investigate more generally its behavior is to run it with different values of n.


*The results are summarized in the following table:

* The table shows that when we increase n (the dimension of the input) 10 times, the execution time increases 100 times.
Input size
n
Execution time
sec
10 10-6
100 10-4
1000 0.01
104 1.071
106 106.5
108 10663.6
 
* Execution time is proportional to g(n) = c1n2 + c2n + c3, where c1, c2, c3 are constants that are determined by a given piece of the program (see below). 
 
* Comparison of two functions:
 g1(n)= 2n2 и g2(n)= 200n,
which show the execution time of two given algorithms А1 and A2,
depending on n
 
 

* Asymptotically, the algorithm A2 is faster and its complexity is linear, while that of A1 is quadratic..

n g1(n) g2(n)
1 2 200
10 200 2000
100 2.104 2.104
1000 2.106 2.105
104 2.108 2.106
106 2.1012 2.108

 Input data size

* Let be a task in which the size of the input data is determined by an integer n.
* Almost all the tasks we will be looking at have this property.
* We will explain the latter by looking at a few examples:

Example 1.
Sort an array with n elements.
The size of the input data is determined by the number n of the array elements.

Example 2.
Find the largest common divisor of a and b.
In this example, the size of the input data is determined by the number of binary digits (bits) of the greater of the numbers a and b.

Example 3.
Find a cover tree on a graph.
In this case, we characterize the size of the input by two numbers: the number of vertices and the number of edges.

Asymptotic notation

* When we are speaking about the complexity of an algorithm, we are most often interested in how it will work at a sufficiently large size n of the input data.
* When formally evaluating the complexity of algorithms, we examine their behavior at "sufficiently large" n.

1. O(f) defines the set of all functions g that grow no faster than f, i.e. there is a constant c > 0 such that g(n) <= cf(n) for all sufficiently large values ​​of n.

2. Theta(f) determines the set of all functions g that grows as fast as f (up to a constant factor), ie. there are constants c1 > 0 and c2 > 0 such that
c1f(n) <= g(n) <= c1f(n) for all sufficiently large values ​​of n.

3. Omega(f) defines the set of all functions g that grow no slower than f, ie. there exists a constant c > 0 such that g(n)> = cf(n) for all sufficiently large values ​​of n.

O(f): Properties and examples

* Notation О(f) is the most commonly used in evaluating the complexity of algorithms and programs.
* More important properties of О(f) (~ denotes belong to):

  • Reflexivity: f ~ О( f );
  • Transitivity: if f ~ О(g), g ~ О(h), then  f ~ О(h);
  • Transposed symmetry:  if f  ~ Omega (g), then g ~ O( f ) and vice versa;
  • Constants can be ignored: for each k > 0, kF ~ О(F);
  • n, raised higher, increases faster: nr~ О(ns), for 0 < r < s.
  • the growth rate of the sum of functions is determined by the fastest growing of them: f + g ~ max(O( f ), O(g));
  • if f(n) is a polynomial of degree d, then f ~ О(nd);
  • Growth rate of commonly used functions:

    Function / n
    1
    2
    10
    100
    1000
    5
    5
    5
    5
    5
    5
    log n
    0
    1
    3.32
    6.64
    9.96
    n
    1
    2
    10
    100
    1000
    n log n
    0
    2
    33.2
    664
    9996
    n2
    1
    4
    100
    104
    106
    n3
    1
    8
    1000
    106
    109
    2n
    2
    4
    1024
    1030
    10300
    n!
    1
    2
    3628800
    10157
    102567
    nn
    1
    4
    1010
    10200
    103000

    ** Determining the complexity of an algorithm.
    * Finding the function that determines the relationship between the size of the input data and the runtime.
    * We always look at the worst case - the worst case inputs.
    - elementary operation - does not depend on the amount of data processed
    - O(1) ;
    - operator sequence - is determined by the asymptotically slowest -  f + g ~ max(O( f ), O(g));
    -operator composition - multiplication by complexity- f (g) ~ O( f*g);
    - conditional operators - is determined by the asymptotically slowest between the condition and the different cases;
    - loops, two nested loops, p nested cloops - O(n), O(n2), O(np) .

    Estimating the complexity of the following loops (how many times the loop will perform in the worst-case): 

    // 1
    for (i = 0; i < n; i++)
     for (j = 0; j < n; j++, sum++);
    // 2
    for (i = 0; i < n; i++)
     for (j = 0; j < n; j++) if (a[i] == b[j]) return;
    // 3
    for (i = 0; i < n; i++)
     for (j = 0; j < n; j++) if (a[i] != b[j]) return;
    // 4
    for (i = 0; i < n; i++)
     for (j = 0; j < n; j++) if (a[i] == a[j]) return;
    // 5
    for (i = 0; i < n; i++)
     for (j = 0; j < i; j++) sum++;
    // 6
    for (i = 0; i < n; i++)
     for (j = 0; j < n*n; j++) sum++;
    // 7
    for (i = 0; i < n; i++)
     for (j = 0; j < i*i; j++) sum++;
    // 8
    for (i = 0; i < n; i++)
     for (j = 0; j < i*i; j++)
       for (k = 0; k < j*j; k++) sum++;


    Logarithmic complexity

    Let's look at the loop:
    for (sum = 0, i = 0; i < n; i *= 2) sum++;
    The variable i has values 1, 2, 4, ..., 2k, ...  until it surpasses n. The loop is run [log n] times. The complexity is O(log n).

    Calculation of recursion complexity
    * Binary search in sorted array - recursive algorithm. 

    int binary_search(vector<int> v, int from, int to, int a)
    {  
       if (from > to)
          return -1;
       int mid = (from + to) / 2;
       if (v[mid] == a)
          return mid;
       else if (v[mid] < a)
          return binary_search(v, mid + 1, to, a);
       else
          return binary_search(v, from, mid - 1, a);
    }

    * We count the references to the elements of the array.
    * The recursive function looks at the middle element and makes a recursive call with a twice smaller array.
    * Therefore, if T(n) is the function that specifies the number of hits, then T(n) = T(n/2) + 1.
    * From the equality
                    T(n) = T(n/2) + 1 = T(n/4) + 2 = T(n/8) + 3 = ... = T(n/2k) + k
    we obtain for n = 2k that  T(n) = T(1) + log n, i.e. the complexity of the algorithm is O(log n).

    Difficulties of asymptotic notation - dependence on input data
    * Best case, worst case, total complexity (examples - sorting algorithms).