JM

Table of Contents

Computational Complexity

Video


When timing our algorithms we are implicitly asking how many resources an algorithm needs. As mentioned before, these resources can be in terms of time or space requirements. Space refers simply to the amount of information the algorithm needs to “remember” at any one time. We will look at both of these type of resources here.

First, we need to provide a distinct approach to the practical measurements of the previous section. The failings of this approach is that it was often very abstract to compare algorithms on different hardware, as the numbers themselves are somewhat arbitrary and highly coupled to the system running the benchmarks. We need a way to describe the typical performance of an algorithm, abstracted away from the absolute time taken to run the algorithm. The way we can analyse this is by asking about scale. In order to do this, let’s create a very simple example algorithm:

julia
function elementmul(a, b)
    # Make sure the inputs are the same size
    @assert all(size(a).==size(b))
    # Allocate a new array to store the result
    c = similar(a)
    for i in eachindex(a)
        c[i] = a[i] * b[i]
    end
    return c
end

This algorithm is only correct when the inputs are arrays of the same size. The eachindex function is an iterator which allows the for loop to iterate through each element in the array a. The similar function is very helpful, as it allows us to create a new array which is the same size and type as a, but with no values specified. The values in this array are often random, as the elements are not set when created.

What do we mean when we talk about scale? Let’s think about the size of the inputs. As an example, let’s say that a and b are nn-dimensional vectors. How will the resources used scale with nn?. In order to answer this, we need to count how many lines of code are executed. If we ignore the size check, we first arrive at allocating the new array. Allocating is the process of carving out a section of your memory in which to store some results during the runtime of your program. An array can be deallocated to free up space for other objects during the runtime. Allocating is usually an expensive operation, but we don’t know how it scales with the input size, so let’s just denote this with s(n)s(n), which is a function describing the time cost of allocating an array using the similar function; for now, let’s assume that this can be done in O(nk)\mathcal{O}(n^k) where k1k \leq 1. Next of all, we have the for loop. We can ignore initialising the for loop, since this is not dependent on the number of times the for loop will be executed. However, the code inside the for loop will iterate many times. In this case, there is an iteration for each element in the a (or b) array. This means that this piece of code will execute nn times. The return at the end of the function also does not scale with nn. We can conclude that this piece of code will likely scale linearly with nn. This means that the time taken to run this code should be of the form:

T(n)=c1n+c0T(n)=c_1 n + c_0

This is the equation of a straight line, and what we can do is actually time this function using the @elapsed function, for different sized inputs and graph this to test our analysis.

Figure 1: Time taken when using elementmul for several values of n. The time taken is measured several times for each value of n using the @belapsed macro from BenchmarkTools.jl.

In Figure 1, you can see that the time is roughly linearly proportional to the size of the vectors in log-log space. Since we can model this relationship as a straight line in log-log, we can write plot is log(tt0)=mlog(n)+c\log(t-t_0) = m \log(n) + c, where mm is the gradient of the line and cc is the tt-intercept. This is equivalent to

t=t0+Anmt = t_0 + A n^m

where A=ecA=e^c and t0t_0 is some constant overhead time. If we were to calculate this, we would see that mm is roughly 1, which means this function scales linearly. This means that our assumption about the similar function was correct, at least in the range of nn values seen.

Question: Can you design an experiment to test the time complexity of the similar function? Do the results change when comparing the distribution of times taken, when compared with taking the minimum time?

Almost any computer that we run this experiment on will give us the same result, that the time taken for elementmul to execute is linearly proportional to vector size nn. The results will not be exactly the same, since the gradient and offset of the line will be different. These are the constants c1c_1 and c2c_2 in the equation above. For this reason, Computer Scientists often use asymptotic notation, or more fondly referred to as Big O notation. This type of notation drops the constants from the equation, but also simplifies the equation to only include the fastest scaling terms of nn. This means that if we have a time relationship of T(n)=c2n2+c1n+c0T(n) = c_2n^2 + c_1n + c_0, we would only keep the n2n^2 term as this term grows the fastest. For this example, the Big O notation would be O(n2)\mathcal{O}(n^2). The way we calculate the important terms is by taking limits as nn approaches \infty as shown below:

limnT(n)=limnn2(c2+c1n+c0n2)=n2(c2+0+0)=c2n2\lim_{n\rightarrow \infty}{T(n)}=\lim_{n\rightarrow \infty}{n^2 (c_2+\frac{c_1}{n} + \frac{c_0}{n^2})}=n^2 (c_2 + 0 + 0)=c_2 n^2

One can see that we can factor out the biggest term n2n^2 and the remaining terms will approach 0 as nn approaches \infty. This allows us to ignore these terms, leaving only the constant multiplied by n2n^2. Since we are also not interested in the constant, we drop this to leave us with the asymptotic time complexity being O(n2)\mathcal{O}(n^2).

Examples

The process of assessing runtime is a skill that can be easily learnt with practice. For this reason, let’s go through a few algorithms and assess their computational complexity in both time and space.

Straightforward Nearest Neighbour Calculation

To start with, take a nearest neighbour algorithm. Given many points, forming a “point cloud”, and an input point, the algorithm calculates which point is closest to every other point. This algorithm is commonly used in machine learning classification problems. To define this problem, we have a cloud of DD dimensional points. We have NN of these points. The point cloud can therefore be represented by an N×DN \times D array, since we need to store DD numbers for each of the NN points. The output is an NN-dimensional array. The value of index ii in the array corresponds to the index of the closest point to the ii-th point in the point cloud. We also need to define how we measure the distance between points; to keep this simple, let’s use the Euclidean distance.

julia
function nearestneighbour(pointcloud)
    # Get the dimension and size of the point cloud
    N, D = size(pointcloud)
    neighbours = zeros(Int, N)
    # Allocate a new array to store the result
    for i in 1:N
        point_i = @views pointcloud[:, i]
        # Set the current minimum distance to the
        # largest possible value, given the type.
        min_distance_squared = typemax(eltype(pointcloud))
        for j in 1:N
            i == j && continue
            point_j = @views pointcloud[:, j]
            distance_squared = sum((point_i .- point_j).^2)
            if min_distance_squared < distance_squared
                min_distance_squared = distance_squared
                neighbours[i] = j
            end
        end
    end
    return neighbours
end

Now, let’s calculate the computational complexity of this nearest neighbour algorithm.

Let’s move line by line through this algorithm. On the first two lines, we extract some information about the number of input points (N), along with their dimensionality D. We can assume D is fixed at around 22 or 33.

The next line

julia
neighbours = zeros(Int, N)

allocates an array to store the results. We can assume this operation has O(n)\mathcal{O}(n), since at worst, this is a linear relationship, since the memory needs to be allocated, and then reset to 00.

Following our allocation, we enter our first for loop. What one must pay attention to is the bounds of the loop. This loop is executed NN times, so let’s keep that in mind. The next two operations

julia
point_i = @views pointcloud[:, i]
min_distance_squared = typemax(eltype(pointcloud))

create variables that are used later on. These two operations are essentially constant time. Let’s again keep this in mind for when we look at the next, nested for loop.

This for loop again will iterate over NN objects, which means that this nested for loop is at least O(N)\mathcal{O}(N). If you look carefully at all the operation within, you will realise that they happen in roughly constant time (technically linear time with respect to DD). This means the inner for loop really does have complexity O(N)\mathcal{O}(N), which when combined with the other constant operations in the outer loop remains O(N)\mathcal{O}(N). Since this complexity happens NN times, the overall complexity for the outer loop is now O(N2)\mathcal{O}(N^2).

Recursive Complexity

julia
function recursivecalc(n, a)
    if n==1
        return a*a
    end

    s = 0.0
    for i in 1:n
        s += recursivecalc(n-1, i)
    end
    
    return s
end

Before we go into details, try and have a go at calculating the complexity of this algorithm. Just try to write down an expression for the total number of constant time operations performed (roughly), focusing on how this will scale with the input size n.

Since this is a recursive algorithm, let’s write down what happens at one step of this computation. Let us calculate the complexity of recursivecalc(k, 1), where k>1k>1 and kZk \in \mathbb{Z}. We ignore the first check (however, we must remember that this takes a constant amount of time). Then we go into a for loop with the inner loop taking O(f(k1))\mathcal{O}(f(k-1)) amount of time complexity, where ff is the unknown complexity of the algorithm. This means that the computational complexity can be written as

O(f(k))=O(k×f(k1))\mathcal{O}(f(k))=\mathcal{O}(k \times f(k-1))

This fundamentally is a recursive calculation. If we substitute this equation back into itself, we find that O(f(k))=O(k(k1)×f(k2))\mathcal{O}(f(k))=\mathcal{O}(k(k-1) \times f(k-2)). Each time we substitute the equation back into itself, we multiply another term, until we are left with f(1)f(1), which can be calculated exactly, as this takes linear time, which therefore has a value of just 1. The complexity function is therefore given by f(k)=k!f(k)=k! and hence the computational complexity of this algorithm is O(k!)\mathcal{O}(k!). If we were to do a similar substitution process in the code, as we did with the equation, we would find that we would have kk nested loops. Each loop multiplies the complexity by the number of times the loop is iterated through. This is a simple rule to use when calculating complexity, whenever you see a for loop, think about multiplying the complexities of each loop together to find the resulting complexity. The only complication is when there is complex control flow logic which can make these calculations more difficult.

Tree-like Complexity

A tree is a very common data structure in computer science, since they often have algorithmic implementations which have far better computational complexity. Let’s take an example of inserting a number into a sorted list. Let’s say we have a sorted array of NN numbers and want find where a number exists within that array. An implementation of this algorithm could look like the following:

julia
function contains(numbers, num)
    n = length(numbers)
    @assert n >= 1
    startpoint, endpoint = 1, n
    while endpoint - startpoint > 1
        midpoint = (startpoint+endpoint)÷2
        if num < numbers[midpoint]
            endpoint = midpoint
        elseif num > numbers[midpoint]
            startpoint = midpoint
        else
            return true
        end
    end
    return numbers[startpoint] == num || numbers[endpoint] == num
end

We shall focus on the while loop, all the other lines run in constant time. However, since we are dealing with a while loop, it is not immediately obvious how many times it is executed. However, we know that the range being checked halves upon every iteration of this while loop. The while loop ends either when the number is found, or when the range has contracted to 11. This means that we need to ask the question - how many times do you have to halve nn to get 1? In mathematical terms, we are asking, which xx satisfies the equation n×(1/2)x=1n \times (1/2)^x = 1, which is solved by x=log2(n)x=\log_2(n). This means that this algorithm scales with the logarithm of nn, and hence the complexity is simply written as O(log(n))\mathcal{O}(\log(n)).

This process of cutting down a search space by a constant factor is surprisingly common in many tree-like algorithms. Having an O(log(n))\mathcal{O}(\log(n)) algorithm is far more performant than even linearly complex algorithms (O(n)\mathcal{O}(n)). Consider a linear implementation for this same problem that iterates through each of the numbers in the list in order, until the correct place to stop is found. This implementation would, on average, take around n/2n/2 iterations (assuming the search item is likely to be placed uniformly in the array), and hence would be a linear algorithm. At scale, the tree-like implementation would be many orders of magnitude faster than any linear implementation.

Conclusion

Hopefully, one has gleaned an intuition for calculating computational complexity. While the topic may at first appear somewhat intimidating, one need only be able to count how many times each line is being run, which is usually rather straightforward. This process can become arbitrarily more difficult for more obscure and complex algorithms, especially when the algorithm relies on random numbers. The main takeaway here, is that nested for loops incur a huge computational cost, especially when processing large datasets. If one can avoid this cost with a more efficient algorithm, then it is usually worth the effort to use it, especially if you need a faster speed.

Sometimes, algorithms are a bit too complex to accurately work out the complexity. In that case, it is usually worth running some benchmarks for varying input sizes and plotting the results on a graph to see the way that the code scales empirically. This approach is also advantages in that it avoids one of the biggest pitfalls of using complexity to judge how quickly an algorithm should execute - Big O\mathcal{O} drops vital information needed to understand how long an algorithm will take to execute. It can only tell us how that runtime scales with some input size nn as nn\rightarrow \infty. All the constants we dropped in our analysis can have a huge impact on how efficient a given implementation is - for this reason, one should only rely on complexity as a heuristic to guide which algorithm to use in a given situation, but one should rely on empirical data to accurately compare various algorithmic implementations of algorithms with similar complexity.