Computational Complexity
Video
When timing our algorithms we are implicitly asking how many resources an algorithm needs. As mentioned before, these resources can be in terms of time or space requirements. Space refers simply to the amount of information the algorithm needs to “remember” at any one time. We will look at both of these type of resources here.
First, we need to provide a distinct approach to the practical measurements of the previous section. The failings of this approach is that it was often very abstract to compare algorithms on different hardware, as the numbers themselves are somewhat arbitrary and highly coupled to the system running the benchmarks. We need a way to describe the typical performance of an algorithm, abstracted away from the absolute time taken to run the algorithm. The way we can analyse this is by asking about scale. In order to do this, let’s create a very simple example algorithm:
function elementmul(a, b)
# Make sure the inputs are the same size
@assert all(size(a).==size(b))
# Allocate a new array to store the result
c = similar(a)
for i in eachindex(a)
c[i] = a[i] * b[i]
end
return c
endThis algorithm is only correct when the inputs are arrays of the same size. The eachindex function is an iterator which allows the for loop to iterate through each element in the array a. The similar function is very helpful, as it allows us to create a new array which is the same size and type as a, but with no values specified. The values in this array are often random, as the elements are not set when created.
What do we mean when we talk about scale? Let’s think about the size of the inputs. As an example, let’s say that a and b are -dimensional vectors. How will the resources used scale with ?. In order to answer this, we need to count how many lines of code are executed. If we ignore the size check, we first arrive at allocating the new array. Allocating is the process of carving out a section of your memory in which to store some results during the runtime of your program. An array can be deallocated to free up space for other objects during the runtime. Allocating is usually an expensive operation, but we don’t know how it scales with the input size, so let’s just denote this with , which is a function describing the time cost of allocating an array using the similar function; for now, let’s assume that this can be done in where . Next of all, we have the for loop. We can ignore initialising the for loop, since this is not dependent on the number of times the for loop will be executed. However, the code inside the for loop will iterate many times. In this case, there is an iteration for each element in the a (or b) array. This means that this piece of code will execute times. The return at the end of the function also does not scale with . We can conclude that this piece of code will likely scale linearly with . This means that the time taken to run this code should be of the form:
This is the equation of a straight line, and what we can do is actually time this function using the @elapsed function, for different sized inputs and graph this to test our analysis.
In Figure 1, you can see that the time is roughly linearly proportional to the size of the vectors in log-log space. Since we can model this relationship as a straight line in log-log, we can write plot is , where is the gradient of the line and is the -intercept. This is equivalent to
where and is some constant overhead time. If we were to calculate this, we would see that is roughly 1, which means this function scales linearly. This means that our assumption about the similar function was correct, at least in the range of values seen.
Question: Can you design an experiment to test the time complexity of the
similarfunction? Do the results change when comparing the distribution of times taken, when compared with taking the minimum time?
Almost any computer that we run this experiment on will give us the same result, that the time taken for elementmul to execute is linearly proportional to vector size . The results will not be exactly the same, since the gradient and offset of the line will be different. These are the constants and in the equation above. For this reason, Computer Scientists often use asymptotic notation, or more fondly referred to as Big O notation. This type of notation drops the constants from the equation, but also simplifies the equation to only include the fastest scaling terms of . This means that if we have a time relationship of , we would only keep the term as this term grows the fastest. For this example, the Big O notation would be . The way we calculate the important terms is by taking limits as approaches as shown below:
One can see that we can factor out the biggest term and the remaining terms will approach 0 as approaches . This allows us to ignore these terms, leaving only the constant multiplied by . Since we are also not interested in the constant, we drop this to leave us with the asymptotic time complexity being .
Examples
The process of assessing runtime is a skill that can be easily learnt with practice. For this reason, let’s go through a few algorithms and assess their computational complexity in both time and space.
Straightforward Nearest Neighbour Calculation
To start with, take a nearest neighbour algorithm. Given many points, forming a “point cloud”, and an input point, the algorithm calculates which point is closest to every other point. This algorithm is commonly used in machine learning classification problems. To define this problem, we have a cloud of dimensional points. We have of these points. The point cloud can therefore be represented by an array, since we need to store numbers for each of the points. The output is an -dimensional array. The value of index in the array corresponds to the index of the closest point to the -th point in the point cloud. We also need to define how we measure the distance between points; to keep this simple, let’s use the Euclidean distance.
function nearestneighbour(pointcloud)
# Get the dimension and size of the point cloud
N, D = size(pointcloud)
neighbours = zeros(Int, N)
# Allocate a new array to store the result
for i in 1:N
point_i = @views pointcloud[:, i]
# Set the current minimum distance to the
# largest possible value, given the type.
min_distance_squared = typemax(eltype(pointcloud))
for j in 1:N
i == j && continue
point_j = @views pointcloud[:, j]
distance_squared = sum((point_i .- point_j).^2)
if min_distance_squared < distance_squared
min_distance_squared = distance_squared
neighbours[i] = j
end
end
end
return neighbours
endNow, let’s calculate the computational complexity of this nearest neighbour algorithm.
Let’s move line by line through this algorithm. On the first two lines, we extract some information about the number of input points (N), along with their dimensionality D. We can assume D is fixed at around or .
The next line
neighbours = zeros(Int, N)allocates an array to store the results. We can assume this operation has , since at worst, this is a linear relationship, since the memory needs to be allocated, and then reset to .
Following our allocation, we enter our first for loop. What one must pay attention to is the bounds of the loop. This loop is executed times, so let’s keep that in mind. The next two operations
point_i = @views pointcloud[:, i]
min_distance_squared = typemax(eltype(pointcloud))create variables that are used later on. These two operations are essentially constant time. Let’s again keep this in mind for when we look at the next, nested for loop.
This for loop again will iterate over objects, which means that this nested for loop is at least . If you look carefully at all the operation within, you will realise that they happen in roughly constant time (technically linear time with respect to ). This means the inner for loop really does have complexity , which when combined with the other constant operations in the outer loop remains . Since this complexity happens times, the overall complexity for the outer loop is now .
Recursive Complexity
function recursivecalc(n, a)
if n==1
return a*a
end
s = 0.0
for i in 1:n
s += recursivecalc(n-1, i)
end
return s
endBefore we go into details, try and have a go at calculating the complexity of this algorithm. Just try to write down an expression for the total number of constant time operations performed (roughly), focusing on how this will scale with the input size n.
Since this is a recursive algorithm, let’s write down what happens at one step of this computation. Let us calculate the complexity of recursivecalc(k, 1), where and . We ignore the first check (however, we must remember that this takes a constant amount of time). Then we go into a for loop with the inner loop taking amount of time complexity, where is the unknown complexity of the algorithm. This means that the computational complexity can be written as
This fundamentally is a recursive calculation. If we substitute this equation back into itself, we find that . Each time we substitute the equation back into itself, we multiply another term, until we are left with , which can be calculated exactly, as this takes linear time, which therefore has a value of just 1. The complexity function is therefore given by and hence the computational complexity of this algorithm is . If we were to do a similar substitution process in the code, as we did with the equation, we would find that we would have nested loops. Each loop multiplies the complexity by the number of times the loop is iterated through. This is a simple rule to use when calculating complexity, whenever you see a for loop, think about multiplying the complexities of each loop together to find the resulting complexity. The only complication is when there is complex control flow logic which can make these calculations more difficult.
Tree-like Complexity
A tree is a very common data structure in computer science, since they often have algorithmic implementations which have far better computational complexity. Let’s take an example of inserting a number into a sorted list. Let’s say we have a sorted array of numbers and want find where a number exists within that array. An implementation of this algorithm could look like the following:
function contains(numbers, num)
n = length(numbers)
@assert n >= 1
startpoint, endpoint = 1, n
while endpoint - startpoint > 1
midpoint = (startpoint+endpoint)÷2
if num < numbers[midpoint]
endpoint = midpoint
elseif num > numbers[midpoint]
startpoint = midpoint
else
return true
end
end
return numbers[startpoint] == num || numbers[endpoint] == num
endWe shall focus on the while loop, all the other lines run in constant time. However, since we are dealing with a while loop, it is not immediately obvious how many times it is executed. However, we know that the range being checked halves upon every iteration of this while loop. The while loop ends either when the number is found, or when the range has contracted to . This means that we need to ask the question - how many times do you have to halve to get 1? In mathematical terms, we are asking, which satisfies the equation , which is solved by . This means that this algorithm scales with the logarithm of , and hence the complexity is simply written as .
This process of cutting down a search space by a constant factor is surprisingly common in many tree-like algorithms. Having an algorithm is far more performant than even linearly complex algorithms (). Consider a linear implementation for this same problem that iterates through each of the numbers in the list in order, until the correct place to stop is found. This implementation would, on average, take around iterations (assuming the search item is likely to be placed uniformly in the array), and hence would be a linear algorithm. At scale, the tree-like implementation would be many orders of magnitude faster than any linear implementation.
Conclusion
Hopefully, one has gleaned an intuition for calculating computational complexity. While the topic may at first appear somewhat intimidating, one need only be able to count how many times each line is being run, which is usually rather straightforward. This process can become arbitrarily more difficult for more obscure and complex algorithms, especially when the algorithm relies on random numbers. The main takeaway here, is that nested for loops incur a huge computational cost, especially when processing large datasets. If one can avoid this cost with a more efficient algorithm, then it is usually worth the effort to use it, especially if you need a faster speed.
Sometimes, algorithms are a bit too complex to accurately work out the complexity. In that case, it is usually worth running some benchmarks for varying input sizes and plotting the results on a graph to see the way that the code scales empirically. This approach is also advantages in that it avoids one of the biggest pitfalls of using complexity to judge how quickly an algorithm should execute - Big drops vital information needed to understand how long an algorithm will take to execute. It can only tell us how that runtime scales with some input size as . All the constants we dropped in our analysis can have a huge impact on how efficient a given implementation is - for this reason, one should only rely on complexity as a heuristic to guide which algorithm to use in a given situation, but one should rely on empirical data to accurately compare various algorithmic implementations of algorithms with similar complexity.