2 Algorithmic Complexity

Complexity is a point that will be mentioned literally on every page of this book; the discussion of any algorithm or data structure can’t avoid this topic. After correctness, it is the second most important quality of every algorithm — moreover, often correctness alone doesn’t matter if complexity is neglected, while the opposite is possible: to compromise correctness somewhat in order to get significantly better complexity. By and large, algorithm theory differs from other subjects of CS in that it concerns not about presenting a working (correct) way to solve some problem but about finding an efficient way to do it. Where efficiency is understood as the minimal (or admissible) number of operations performed and occupied memory space.

In principle, the complexity of an algorithm is the dependence of the number of operations that will be performed on the size of the input. It is crucial to the computer system’s scalability: it may be easy to solve the programming problem for a particular set of inputs, but how will the solution behave if the input is doubled, increased tenfold or million-fold? This is not a theoretical question, and an analysis of any general-purpose algorithm should have a clear answer to it.

Complexity is a substantial research topic: a whole separate branch of CS — Complexity Theory — exists to study it. Yet, throughout the book, we’ll try to utilize the end results of such research without delving deep into rigorous proofs or complex math, especially since, in most of the cases, measuring complexity is a matter of simple counting. Let’s look at the following illustrative example:

(defun mat-max (mat)
  (let (max)
    (dotimes (i (array-dimension mat 0))
      (dotimes (j (array-dimension mat 1))
        (when (or (null max)
                  (> (aref mat i j) max))
          (setf max (aref mat i j)))))
    max))

This function finds the maximum element of a two-dimensional array (matrix):

CL-USER> (mat-max #2A((1 2 3) (4 5 6)))
6

What’s its complexity? To answer, we can just count the number of operations performed: at each iteration of the inner loop, there are 2 comparisons involving 1 array access, and, sometimes, if the planets align we perform another access for assignment. The inner loop is executed (array-dimension mat 1) times (let’s call it m where m=3), and the outer one — (array-dimension mat 0) (n=2, in the example). If we sum this all up we’ll get: n * m * 4 as an upper limit, for the worst case when each sequent array element is larger then the previous. As a rule of thumb, each loop adds multiplication to the formula, and each sequential block adds a plus sign.

In this calculation, there are two variables (array dimensions n and m) and one constant (the number of operations performed for each array element). There exists a special notation — Big-O — used to simplify the representation of end results of such complexity arithmetic. In it, all constants are reduced to 1, and thus m * 1 becomes just m, and also since we don’t care about individual array dimension differences we can just put n * n instead of n * m. With such simplification, we can write down the final complexity result for this function: O(n^2). In other words, our algorithm has quadratic complexity (which happens to be a variant of a broader class called “polynomial complexity”) in array dimensions. It means that by increasing the dimensions of our matrix ten times, we’ll increase the number of operations of the algorithm 100 times. In this case, however, it may be more natural to be concerned with the dependence of the number of operations on the number of elements of the matrix, not its dimensions. We can observe that n^2 is the actual number of elements, so it can also be written as just n — if by n we mean the number of elements, and then the complexity is linear in the number of elements (O(n)). As you see, it is crucial to understand what n we are talking about!

There are just a few more things to know about Big-O complexity before we can start using it to analyze our algorithms.

  1. There are 6 major complexity classes of algorithms:
    • constant-time (O(1))
    • sublinear (usually, logarithmic — O(log n))
    • linear (O(n)) and superlinear (O(n * log n))
    • higher-order polynomial (O(n^c), where c is some constant greater than 1)
    • exponential (O(c^n), where c is usually 2 but, at least, greater than 1)
    • and just plain lunatic complex (O(n!) and so forth) — I call them O(mg), jokingly

    Each class is a step-function change in performance, especially, at scale. We’ll talk about each one of them as we’ll be discussing the particular examples of algorithms falling into it.

  2. Worst-case vs. average-case behavior. In this example, we saw that there may be two counts of operations: for the average case, we can assume that approximately half of the iterations will require assignment (which results in 3,5 operations in each inner loop), and, for the worst case, the number will be exactly 4. As Big-O reduces all numbers to 1, for this example, the difference is irrelevant, but there may be others, for which it is much more drastic and can’t be discarded. Usually, for such algorithms, both complexities should be mentioned (alongside with ways to avoid worst-case scenarios): a good example is quicksort algorithm described in the subsequent chapter.
  3. We have also seen the so-called “constant factors hidden by the Big-O notation”. I.e., from the point of view of algorithm complexity, it doesn’t matter if we need to perform 3 operations in the inner loop or 30. Yet, it is quite important in practice, and we’ll also discuss it below when examining binary search. Even more, some algorithms with better theoretical complexity may be worse in many practical applications due to these hidden factors (for example, until the dataset reaches a certain size).
  4. Finally, besides execution time complexity, there’s also space complexity, which instead of the number of operations measures the amount of storage space used proportional to the size of the input. In general, similar approaches are applied to its estimation.