Exercises

Before beginning the exercises, check that your computer has multiple cores, and then run Julia with access to additional threads, either by setting the environment variable JULIA_NUM_THREADS and restarting Julia, or by running Julia with the command julia -t 4, which starts the Julia REPL with 4 threads¹. The number of threads used should be set to the number of logical cores on your computer. One can check the number of available threads by using Threads.nthreads() in the REPL.

Question 1

Consider the function:

julia

function generate_random_numbers(k)
    """Generates n random numbers"""
    for _ in 1:k
        rand()
    end
    nothing
end

This function can simulate a variable amount of work. This function can be called via generate_random_numbers(k), where $k$ is chosen to mimic short or long workloads. One can imagine having to repeat this work $n$ times. Empirically measure the time taken for these variable work loads as a function of $n$ with and without Threads.@threads. Uses these times to create a plot to show the performance of the multithreaded task and serial task vs task size ( $n$ ). Reproduce this graph for varying workload size, $k$ , e.g. $k=10$ (short), $k=10^5$ (medium) and $k=10^9$ (long), adjusting the range of $n$ used to compensate for increased workload of the loop.

Question 2

Continuing on the results from the previous question, assuming that $n$ is equal to the number of available threads, what is the minimum workload size needed to see improvements from using multithreading.

Question 3

Write your own multithreaded mapreduce implementation, using the same format as the following serial implementation:

julia

function custom_mapreduce(map_fn, reduction_op, array)
    @assert length(array) >= 2
    mapped_results = map_fn.(array)
    acc = reduction_op(mapped_results[1], mapped_results[2])
    for i in 3:length(array)
        acc = reduction_op(acc, mapped_results[i])
    end
    return acc
end

One can assume that the reduction operation is associative.

Question 4

Take the following function:

julia

function randomised_workload(min_time, max_time)
    # Generate a random amount of time to wait
    time = rand() * (max_time-min_time) + min_time
    sleep(time)
    nothing
end

This will allow you to simulate a process whose competition time is uniformly random. Use this in an experiment to compare the effectiveness of chunking a serial implementation vs a simple threaded implementation when the workload size is randomised. How large does the workload size variability need to be for the simpler implementation to be more effective?

Question 5

Imagine a function $f(n)$ whose runtime can be predicted via $c(n) = an+b$ , where $a$ and $b$ are real positive constants and $a \gg b$ . One needs to evaluate $f(n)$ at $n=1, 2, \ldots, 9999, 10000$ . If one has $k$ workers available to calculate the results of $f(n)$ for each of the required $n$ , how must the tasks be scheduled amongst the workers to minimise runtime? Repeat the same analysis, now using $c(n) = an^2+b$ .

Note that answers need not be exact, just more efficient than a naive scheduler.

Footnotes

Note that one can also use auto as an option for the number of threads to use all logical cores.↩

Back Next

Contents

1. Introduction

2. Foundations

3. Julia

4. Measuring Performance

5. Optimisation

6. Parallel Programming

7. Multithreading

8. GPU Programming

Table of Contents

Exercises

Question 1

Question 2

Question 3

Question 4

Question 5

Footnotes