von Neumann Architecture

Practically all modern computers that you will use can be described using the language of a von Neumann machine. In Figure 1, we have a diagram depicting this type of architecture. We have a Central Processing Unit (CPU), which is connected to a memory unit that is used to store both instructions and data. The CPU is able to both store data in memory and retrieve data at a given memory address. Memory can be though of as a huge table of data, with each slot of data having a given address, which uniquely specifies its location in the table.

**Figure 1:** A diagram of the basic von Neumann computer architecture. Shows a Central Processing Unit (CPU), connected to a memory unit via a bus. Inside the CPU, there are several registers, coloured green. Each register is capable of storing one piece of data, typically 32 or 64 bits. The registers included are the Program Counter (PC), Current Instruction Register (CIR), Memory Address Register (MAR), Memory Data Register (MDR) and the Accumulator. The CPU contains a Control Unit (CU) and Arithmetic Logic Unit (ALU).

Inside the CPU, there are several registers. Each register is capable of storing some small amount of data, typically 32 or 64 bits, depending on the system. Each register serves a specific purpose:

Program Counter (PC): Holds the address of the next instruction to be executed.
Current Instruction Register (CIR): Holds the current instruction being executed, usually comprised of an opcode and an operand.
Memory Address Register (MAR): Holds the address of the memory location for which data should be fetched or written to, depending on the instruction.
Memory Data Register (MDR): Temporarily stores data read from or written to memory - this acts as a buffer for data coming in and out of memory.
Accumulator (ACC): The accumulator is general purpose register, used to temporarily store data required for processing. Typically, most CPUs have several of these general purpose registers.

The processor has circuitry to orchestrate execution across the CPU, most of which lives inside the Control Unit. This component in particularly manages how individual instructions are executed.

Finally, the Arithmetic Logic Unit (ALU) contains circuits which perform basic binary mathematical operations on data stored inside registers. These operations could be low level operations such as bitwise XOR, bit shifts or NOT operations. Most modern CPUs will also have circuitry to perform more complicated operations such as arithmetic (e.g. multiplication, division etc) on two pieces of data, again stored inside the CPU’s registers. Which operation is performed is dependent on the instruction processed by the Control Unit.

Instruction Sets

A CPU has an instruction set that it is able to process. These are a series of operations, usually specified by an opcode, which defines how the CPU will process information. For example, one command may be to fetch a piece of data from memory at a specific address. Another may be to add a number to the number currently stored in the accumulator and write the result back into the accumulator. Usually, CPUs will have a compatible Instruction Set Architecture (ISA), which defines the set of available instructions, registers and data types that the CPU supports. Having a standard ISA allows the same compiled software to run on different CPUs, as long as the CPUs have the same ISA.

Having an ISA is a good abstraction, allowing for the same program (a collection of instructions and data) to run on different systems. For example, Visual Studio Code can be downloaded as a Windows x86_64 executable to run on a wide range of AMD and Intel processors, even though the hardware differs. In this example, x86_64 defines the software architecture that the instructions are written in. x86 is one of the most popular instruction sets, and is an example of Complex Instruction Set Computer (CISC) architecture. The 64, refers to the number of bits available to be processed at once. This usually specifies the width of the registers on the CPU. If your CPU only has 32-bit registers, it cannot run a program with 64-bit instructions. However, 64-bit processors are able to run 32-bit programs.

While the x86 instruction set (and the 64-bit variant) is still extremely popular, another type of instruction set is becoming more prevalent. Arm is an example of a Reduced Instruction Set Computer (RISC) architecture, which is used in many mobile phone CPUs, along with newer Apple products. The difference between x86 and Arm instruction sets are far beyond the scope of this module, but it is sufficient to know that they are different, and the instructions executed on either are different.

Finally, the instructions that a CPU can execute depend on the actual hardware available in the chip. Some instructions can only be executed on CPUs with that specific circuitry built-in. For this course, the most important of these features to consider are vector instructions, which allow the CPU to process multiple numbers in one execution cycle. We will talk more about these instructions at a later time in the course.

Fetch-Decode-Execute Cycle

Now that we know about the instruction sets available in a given CPU, how are these instructions actually executed? Once a program is loaded into memory, execution begins at the first instruction in the program, which is loaded into the PC register. The following cycle will repeat until an end criterion is reached.

Fetch

Step 1: The address in the PC is loaded into the MAR.
Step 2: The MAR points to a location in memory which is fetched and stored in the MDR. Simultaneously, the PC is incremented to point at the next instruction.
Step 3: The contents of the MDR are copied into the CIR.

Decode

Step 4: The instruction now held in the CIR is decoded, usually into an opcode and operand. The opcode decided what operation is performed next. The operand holds some piece of information to be used in the operation. This could be the memory of data to be used, the actual data itself etc.

Execute

Step 5: The instruction specified by the decoded opcode is carried out. This can have a range of effects, depending on the instruction itself, and can affect many registers in the process.

While this process seems fairly simple, and the range of instructions is very limited, this approach allows a CPU to perform extremely complicated tasks.

Additional Components

While Figure 1 shows the CPU and main memory units, it does not include other important connections. The CPU is usually also connected to a variety of I/O devices (e.g. keyboard, mouse, hard drive etc). CPUs usually have a dedicated component, designated to handle communication with these devices. We will cover these additional components more in the hardware section.

It is important to remember that the von Neumann architecture is only a simplified model. Modern CPUs are incredibly complicated, with many features and optimisations designed to get the most performance out of a single chip. This includes adding multiple copies of the CPU (called a core) and adding a chip to process multiple streams of instructions simultaneously. Additionally, a lot of space in modern CPUs is allocated towards cache, small buffers of memory that can be used to quickly retrieve information without needing to wait for memory to be returned from RAM. In the rest of this course, we will talk about the sections of modern hardware that we need to think about if we are to try and get the best performance out of our software.

Back Next

High Performance Computing in Julia

1. Introduction

2. Foundations

3. Julia

4. Measuring Performance

5. Optimisation

6. Parallel Programming

7. Multithreading

8. GPU Programming

Table of Contents