JM

Multithreading

As there are many types of parallelism, we must first make a distinction about multithreading. Multithreading is the most common parallel programming paradigm you will come across, at least in languages like Julia, C#, etc. It is usually the easiest paradigm to implement, and can usually be added into serial code with only a small amount of tweaks. Multithreading has the following traits:

  • A process using can have many threads, each thread being a distinct, self-contained, sequence of instructions that can execute in parallel or concurrently1.
  • A thread is an abstract unit of work which is usually mapped one to one with a CPU core. A CPU is oversubscribed if the number of concurrent threads being executed is larger than the number of CPU cores, in which case the CPU must spend time switching between the threads to give the illusion of full parallelism, but at the cost of degraded performance.
  • In the multithreading paradigm, threads can access shared memory, and so can read from and write to the same variables.
  • As each CPU core has its own L1 cache, and some cores do not share L2 cache, this means that cache should be considered to be local to each thread.

Simultaneous Multithreading (SMT)

As an additional complication, many modern CPUs now have Simultaneous Multithreading (SMT - also known as Hyper-Threading by Intel). SMT introduces new registers on each CPU core that effectively “double” the number of threads that a CPU core can process. The main idea behind this, is that multiple CPU tasks may need different units within the CPU core. One task may require the ALU (Arithmetic Logic Unit - the unit that handles mathematical operations), while another task only needs to manage memory. With SMT, we can effectively have both tasks running in parallel on the core, as if we had two physical cores. This speed-up only works if the resources that each task need at any one time do not overlap. Adding SMT to a chip is a cheap way of increasing the throughput through a chip. Usually, SMT will only increase the performance of some code by around 20 to 30%, but costs the manufacturers very little to add on. Unfortunately, in scientific computing we rarely have tasks that require different resources on each CPU core, rendering SMT to be of little practical benefit. When choosing the number of threads to use, a safe bet is to use the same number as you have physical cores. On Windows, task manager will show you that you have double the number of cores you actually have, as these show logical cores and not physical cores.

In comparison with other parallel programming paradigms, multithreading is distinct in that is a shared memory paradigm, meaning that multiple workers can work on the same piece of memory. Shared memory between workers is both a blessing and curse. A blessing as one does not need to manage communication of memory between the workers. A curse since multiple workers can access the same memory, leading to potential race conditions which can be hard to detect, but easy to introduce.

Footnotes


  1. Concurrency notes the ability to execute parts (in this case threads) out-of-order or partial processing of one and stopping to work on another. One can think of concurrency as what will happen to parallel code when there is only a single worker, where the worker quickly switches between many tasks, giving the illusion of working on the items in parallel.