Exercises
Before beginning the exercises, check that your computer has multiple cores, and then run Julia with access to additional threads, either by setting the environment variable JULIA_NUM_THREADS
and restarting Julia, or by running Julia with the command julia -t 4
, which starts the Julia REPL with 4 threads1. The number of threads used should be set to the number of logical cores on your computer. One can check the number of available threads by using Threads.nthreads()
in the REPL.
Question 1
Consider the function:
function generate_random_numbers(k)
"""Generates n random numbers"""
for _ in 1:k
rand()
end
nothing
end
This function can simulate a variable amount of work. This function can be called via generate_random_numbers(k)
, where is chosen to mimic short or long workloads. One can imagine having to repeat this work times. Empirically measure the time taken for these variable work loads as a function of with and without Threads.@threads
. Uses these times to create a plot to show the performance of the multithreaded task and serial task vs task size (). Reproduce this graph for varying workload size, , e.g. (short), (medium) and (long), adjusting the range of used to compensate for increased workload of the loop.
Question 2
Continuing on the results from the previous question, assuming that is equal to the number of available threads, what is the minimum workload size needed to see improvements from using multithreading.
Question 3
Write your own multithreaded mapreduce implementation, using the same format as the following serial implementation:
function custom_mapreduce(map_fn, reduction_op, array)
@assert length(array) >= 2
mapped_results = map_fn.(array)
acc = reduction_op(mapped_results[1], mapped_results[2])
for i in 3:length(array)
acc = reduction_op(acc, mapped_results[i])
end
return acc
end
One can assume that the reduction operation is associative.
Question 4
Take the following function:
function randomised_workload(min_time, max_time)
# Generate a random amount of time to wait
time = rand() * (max_time-min_time) + min_time
sleep(time)
nothing
end
This will allow you to simulate a process whose competition time is uniformly random. Use this in an experiment to compare the effectiveness of chunking a serial implementation vs a simple threaded implementation when the workload size is randomised. How large does the workload size variability need to be for the simpler implementation to be more effective?
Question 5
Imagine a function whose runtime can be predicted via , where and are real positive constants and . One needs to evaluate at . If one has workers available to calculate the results of for each of the required , how must the tasks be scheduled amongst the workers to minimise runtime? Repeat the same analysis, now using .
Note that answers need not be exact, just more efficient than a naive scheduler.
Footnotes
- Note that one can also use
auto
as an option for the number of threads to use all logical cores.↩