Computational Complexity

Video

When timing our algorithms we are implicitly asking how many resources an algorithm needs. As mentioned before, these resources can be in terms of time or space requirements. Space refers simply to the amount of information the algorithm needs to “remember” at any one time. We will look at both of these type of resources here.

First, we need to provide a distinct approach to the practical measurements of the previous section. The failings of this approach is that it was often very abstract to compare algorithms on different hardware, as the numbers themselves are somewhat arbitrary and highly coupled to the system running the benchmarks. We need a way to describe the typical performance of an algorithm, abstracted away from the absolute time taken to run the algorithm. The way we can analyse this is by asking about scale. In order to do this, let’s create a very simple example algorithm:

julia

function elementmul(a, b)
    # Make sure the inputs are the same size
    @assert all(size(a).==size(b))
    # Allocate a new array to store the result
    c = similar(a)
    for i in eachindex(a)
        c[i] = a[i] * b[i]
    end
    return c
end

This algorithm is only correct when the inputs are arrays of the same size. The eachindex function is an iterator which allows the for loop to iterate through each element in the array a. The similar function is very helpful, as it allows us to create a new array which is the same size and type as a, but with no values specified. The values in this array are often random, as the elements are not set when created.

What do we mean when we talk about scale? Let’s think about the size of the inputs. As an example, let’s say that a and b are $n$ -dimensional vectors. How will the resources used scale with $n$ ?. In order to answer this, we need to count how many lines of code are executed. If we ignore the size check, we first arrive at allocating the new array. Allocating is the process of carving out a section of your memory in which to store some results during the runtime of your program. An array can be deallocated to free up space for other objects during the runtime. Allocating is usually an expensive operation, but we don’t know how it scales with the input size, so let’s just denote this with $s(n)$ , which is a function describing the time cost of allocating an array using the similar function; for now, let’s assume that this can be done in $O(n^k)$ where $k \leq 1$ . Next of all, we have the for loop. We can ignore initialising the for loop, since this is not dependent on the number of times the for loop will be executed. However, the code inside the for loop will iterate many times. In this case, there is an iteration for each element in the a (or b) array. This means that this piece of code will execute $n$ times. The return at the end of the function also does not scale with $n$ . We can conclude that this piece of code will likely scale linearly with $n$ . This means that the time taken to run this code should be of the form:

T(n)=c_1 n + c_0

This is the equation of a straight line, and what we can do is actually time this function using the @elapsed function, for different sized inputs and graph this to test our analysis.

Figure 1: Time taken when using elementmul for several values of n. The time taken is measured several times for each value of n using the @belapsed macro from BenchmarkTools.jl.

In Figure 1, you can see that the time is roughly linearly proportional to the size of the vectors in log-log space. Since we can model this relationship as a straight line in log-log, we can write plot is $\log(t-t_0) = m \log(n) + c$ , where $m$ is the gradient of the line and $c$ is the $t$ -intercept. This is equivalent to

t = t_0 + A n^m

where $A=e^c$ and $t_0$ is some constant overhead time. If we were to calculate this, we would see that $m$ is roughly 1, which means this function scales linearly. This means that our assumption about the similar function was correct, at least in the range of $n$ values seen.

Question: Can you design an experiment to test the time complexity of the similar function? Do the results change when comparing the distribution of times taken, when compared with taking the minimum time?

Almost any computer that we run this experiment on will give us the same result, that the time taken for elementmul to execute is linearly proportional to vector size $n$ . The results will not be exactly the same, since the gradient and offset of the line will be different. These are the constants $c_1$ and $c_2$ in the equation above. For this reason, Computer Scientists often use asymptotic notation, or more fondly referred to as Big O notation. This type of notation drops the constants from the equation, but also simplifies the equation to only include the fastest scaling terms of $n$ . This means that if we have a time relationship of $T(n) = c_2n^2 + c_1n + c_0$ , we would only keep the $n^2$ term as this term grows the fastest. For this example, the Big O notation would be $O(n^2)$ . The way we calculate the important terms is by taking limits as $n$ approaches $\infty$ as shown below:

\lim_{n\rightarrow \infty}{T(n)}=\lim_{n\rightarrow \infty}{n^2 (c_2+\frac{c_1}{n} + \frac{c_0}{n^2})}=n^2 (c_2 + 0 + 0)=c_2 n^2

One can see that we can factor out the biggest term $n^2$ and the remaining terms will approach 0 as $n$ approaches $\infty$ . This allows us to ignore these terms, leaving only the constant multiplied by $n^2$ . Since we are also not interested in the constant, we drop this to leave us with the asymptotic time complexity being $O(n^2)$ .

Examples

The process of assessing runtime is a skill that can be easily learnt with practice. For this reason, let’s go through a few algorithms and assess their computational complexity in both time and space.

Straightforward Nearest Neighbour Calculation

To start with, take a nearest neighbour algorithm. Given many points, forming a “point cloud”, and an input point, the algorithm calculates which point is closest to every other point. This algorithm is commonly used in machine learning classification problems. To define this problem, we have a cloud of $D$ dimensional points. We have $N$ of these points. The point cloud can therefore be represented by an $N \times D$ array, since we need to store $D$ numbers for each of the $N$ points. The output is an $N$ -dimensional array. The value of index $i$ in the array corresponds to the index of the closest point to the $i$ -th point in the point cloud. We also need to define how we measure the distance between points; to keep this simple, let’s use the Euclidean distance.

julia

function nearestneighbour(pointcloud)
    # Get the dimension and size of the point cloud
    N, D = size(pointcloud)
    neighbours = zeros(Int, N)
    # Allocate a new array to store the result
    for i in 1:N
        point_i = pointcloud[:, i]
        # Set the current minimum distance to the
        # largest possible value, given the type.
        min_distance_squared = typemax(eltype(pointcloud))
        for j in 1:N
            point_j = pointcloud[:, j]
            distance_squared = sum((point_i .- point_j).^2)
            if min_distance_squared < distance_squared
                min_distance_squared = distance_squared
                neighbours[i] = j
            end
        end
    end
    return neighbours
end

Now, let’s calculate the computational complexity of this nearest neighbour algorithm. First, let’s identify how the algorithm scales with the number of points, $N$ . We first notice that we allocate an array of $N$ integers. If we are on a 64-bit system, then this takes up $64N$ bits of memory. After this, we hit our first for loop. The variables here do not have to be stored in memory and hence are not allocated on the heap, but in fact, are allocated on the stack. This means that their allocation is trivial. However, this takes a linear amount of time to do. Keep note of this. Finally, we enter into our second nested for loop, which occurs $N$ times. Here, we are calculating the distance between two points and then possibly writing to memory. The operations here are linear (dependent on $D$ ) and hence we can count this as a single “unit of computation”; it must be realised that this calculation happens $N^2$ times. This is the only place where the algorithm scales with $N$ and hence we can conclude that the complexity of this algorithm is $O(N^2)$ . In terms of memory footprint, this algorithm scales linearly with $N$ . The important takeaway here is that each nested for loop makes you multiply by the number of times the loop occurs. We can explore this idea further with a recursive algorithm.

Recursive Complexity

julia

function recursivecalc(n, a)
    if n==1
        return a*a
    end

    s = 0.0
    for i in 1:n
        s += recursivecalc(n-1, i)
    end
    
    return s
end

Before we go into details, try and have a go at calculating the complexity of this algorithm.

Since this is a recursive algorithm, let’s write down what happens at one step of this computation. Let us calculate the complexity of recursivecalc(k, 1), where $k>1$ and $k \in \mathbb{Z}$ . We ignore the first check (however, we must remember that this takes a constant amount of time). Then we go into a for loop with the inner loop taking $O(f(k-1))$ amount of time complexity, where $f$ is the unknown complexity of the algorithm. This means that the computational complexity can be written as

\mathcal{O}(f(k))=\mathcal{O}(k \times f(k-1))

This fundamentally is a recursive calculation. If we substitute this equation back into itself, we find that $O(f(k))=O(k(k-1) \times f(k-2))$ . Each time we substitute the equation back into itself, we multiply another term, until we are left with $f(1)$ , which can be calculated exactly, as this takes linear time, which therefore has a value of just 1. The complexity function is therefore given by $f(k)=k!$ and hence the computational complexity of this algorithm is $O(k!)$ . If we were to do a similar substitution process in the code, as we did with the equation, we would find that we would have $k$ nested loops. Each loop multiplies the complexity by the number of times the loop is iterated through. This is a simple rule to use when calculating complexity, which is just derived from the definition of multiplication.

Tree-like Complexity

A tree is a very common data structure in computer science, since they often have algorithmic implementations which have far better computational complexity. Let’s take an example of inserting a number into a sorted list. Let’s say we have a sorted array of $N$ numbers and want to insert a new number into the correct location in this list. An implementation of this algorithm could look like the following:

julia

function insertintosorted!(numbers, num)
    n = length(numbers)
    startpoint = 1
    endpoint = n
    while startpoint-endpoint > 1
        midpoint = (startpoint+endpoint)÷2
        if num < numbers[midpoint]
            endpoint = midpoint
        elseif num > numbers[midpoint]
            startpoint = midpoint
        else
             insert!(numbers, midpoint, num)
             return
        end
    end
    insert!(numbers, startpoint, num)
    nothing
end

Here, since we have a while loop instead of a for loop, so it is not immediately obvious how many times it is executed. However, we know that the range being checked halves upon every iteration of this while loop. The while loop ends either when the number is prematurely inserted, or when the range has contracted to 1. This means that we need to ask the question - how many times do you have to halve $n$ to get 1? In mathematical terms, we are asking, which $x$ satisfies the equation $n \times (1/2)^x = 1$ , which is solved by $x=\log_2(n)$ . This means that this algorithm scales with the logarithm of $n$ , and hence the complexity is simply written as $O(\log(n))$ .

This process of cutting down a search space by a constant factor is surprisingly common in many tree-like algorithms. Having an $O(\log(n))$ algorithm is far more performant than even linearly complex algorithms ( $O(n)$ ). Consider a linear implementation for this same problem that iterates through each of the numbers in the list in order, until the correct place to stop is found. This implementation would, on average, take around $n/2$ iterations, and hence would be a linear algorithm. If we had $10^6$ elements, the tree-like implementation would be many orders of magnitude faster than any linear implementation.

Conclusion

Hopefully, one has gleaned an intuition for calculating computational complexity. While the topic may at first appear somewhat intimidating, one need only be able to count how many times each line is being run, which is usually rather straightforward. This process can become arbitrarily more difficult for more obscure and complex algorithms, especially when the algorithm relies on random numbers. The main takeaway here, is that nested for loops incur a huge computational cost, especially when processing large datasets. If one can avoid this cost with a more efficient algorithm, then it is usually worth the effort to use it, especially if you need a faster speed.

Back Next

High Performance Computing in Julia

1. Introduction

2. Foundations

3. Julia

4. Measuring Performance

5. Optimisation

6. Parallel Programming

7. Multithreading

8. GPU Programming

Table of Contents

Computational Complexity

Video

Examples

Straightforward Nearest Neighbour Calculation

Recursive Complexity

Tree-like Complexity

Conclusion