Compilers and Machine Code

While modern architecture is built through abstraction, when it comes time for the code to run, the hardware must eventually be able to execute the program. In order for a program to be executed, it must be translated to machine code. Machine code is made up of very simple, processor dependent instructions, which manipulate data at the most atomic level, sometimes a single bit at a time. Machine code is very difficult for humans to read and write, which is why we now write source code in a higher-level language. This higher level language is human-readable and serves as a convenient abstraction to express complex ideas, without having to think about the lower level details how of the code is executed. Another piece of software is then required to take the human-readable source code and translate it, sometimes through multiple layers of abstraction, to eventually become machine code, which is then executed on the system. The exact process from source code to machine code greatly varies depending on the language used. In this section, we will aim to highlight the most common approaches, with particular focus on how Julia handles this process.

Ahead of Time Static Compilation

Languages traditionally used for high performance computing, such as C or Fortran, are usually compiled ahead of time. This means that the programmer will run a piece of software called a compiler, which will translate their high-level human-readable source code into low level machine code, capable of executing on a specific Instruction Set Architecture (like x86).

This process will be done before any code is executed, and all code that will be required by the program at runtime needs to be generated at this stage. For this reason, it is common that ahead of time languages require the programmer to rigorously define the data types of all variables and return types of all functions used throughout the code. If a language has such a requirement, we call it strongly-typed. This approach aids itself to efficiently compiling a small program and not generating machine code that will not be used.

Compilation is usually quite computationally costly, and large programs can have compile times that range from seconds to hours. The compilation process not only translates the source code into machine code, but performs optimisations on the resulting code so that the code runs efficiently on the target hardware. If a user wants to execute the program on a different set of hardware (i.e. run on Arm vs x86), they would have to recompile the entire project from scratch and use that binary instead.

Ahead of time compiled languages have a reputation for being quite high performance. There are a variety of reasons as to why compilation is necessary when you are looking to get the best performance from your software. One such advantage is that you move much of the translation work outside the runtime of the program. In general, the developer only pays this cost once and then distributes the binary out to all the users (i.e. your web browser comes precompiled). This means that the user can instantly start using the program and does not have to pay the translation cost.

Just-in-Time Compilation

Another form of compilation is to use a Just-in-Time approach. This is the approach adopted by the Julia language. When we run a Julia program, we are actually launching the Julia runtime, which includes several standard libraries, along with its compiler that can turn source code into machine code. When your script calls a function that can not been compiled, the Julia runtime will compile the necessary machine code on the fly, and save this compiled result for later. If we then call the same function again, with arguments of the same data type, the earlier compiled function will be reused.

The main reason that Julia does this, is to allow the language to be dynamically typed, while also gaining the benefits of compiling code “just-ahead-of-time”. In Julia, it is not necessary to define the data types of variables ahead of time, even though every data type will implicitly (or explicitly) have a data type at runtime. The Julia runtime uses this type information to generate efficient machine code for your functions, and only generate the machine code that is actually needed at run time. Additionally, we can use the same implementations with fast machine code across several data types. Let’s look at an example in Julia:

julia

function myfactorial(n)
    if n == 1 || n == 0
        return 1
    end
    return n * myfactorial(n-1)
end

This function defines a recursive factorial function. You can see that no data types have been specified. When we execute the function with the input of an integer ( $4$ ), the output type given is in the same type:

julia

@show typeof(4)
@show typeof(myfactorial(4))

Output

typeof(4) = Int64
typeof(myfactorial(4)) = Int64

If you were to run interactively² this in your own session, you will notice a small delay when running myfactorial(4) for the first time. If you were to run the function again, it will be much faster. This is because the first time the code is executed, Julia will compile the function myfactorial, which takes a little bit of time.

If we are to repeat this test, using a floating point input, you can see that the runtime will recompile a new version of the function, specialised to handle floats:

julia

@show typeof(4.0)
@show typeof(myfactorial(4.0))

Output

typeof(4.0) = Float64
typeof(myfactorial(4.0)) = Float64

This shows us that the generic implementation of myfactorial can be specialised to different input types, without requiring a rewrite of the code. The compiled code will likely run at a speed similar to an ahead-of-time compiled implementation in C or Fortran.

In strongly typed languages like C, we would be required to write two different implementations for myfactorial, one using int and the other using float as the data type of the input argument, and output data type of the function, i.e.

int myfactoriali(int n) {
    if (n==0 || n==1) {
        return 1;
    }
    return n * myfactoriali(n-1);
}
float myfactorialf(float n) {
    if (n==0.0 || n==1.0) {
        return 1.0;
    }
    return n * myfactorialf(n-1.0);
}

Our generic Julia implementation allows us to use the myfactorial function for a variety of data types without changing the source code, and we only pay the compilation cost for the ones that we use.

Julia is not the only dynamically typed language out there. Python is probably the most famous example. Where Python and Julia differ is that Julia uses Just-in-Time (or more accurately Just-Ahead-of-Time) compilation of functions, and will specialise (i.e. compile specific efficient implementations) based on the argument types of all inputs.

Interpreters

Languages like Python instead interpret the source code by first compiling it into bytecode. Bytecode can be viewed as a partial compilation of the source code into an intermediate, high-level representation of the program. This bytecode is then executed line-by-line by the Python interpreter. During execution, the bytecode is translated into machine code or triggers the execution of precompiled machine code in the Python runtime, which produces the equivalent result. For example, if we were to implement the factorial function in Python, it would look like the following:

python

def myfactorial(n):
    if n == 1 or n == 0:
        return 1
    else:
        return n * myfactorial(n-1)

This implementation is similar to the equivalent Julia implementation, but when it is called, such as with myfactorial(5), the interpreter must check the data type of n at every recursive call and determine the appropriate instruction to multiply two integers. Additionally, the interpreter performs other checks, such as ensuring that machine integers are used when the numbers are small enough. However, if you input a large value such as 100, the runtime will silently convert the storage of the output of myfactorial from a machine integer to a “big integer”. This conversion incurs a significant performance overhead due to the extra computations required to handle large numbers.

On my machine, Python takes around 2 microseconds to get the result of myfactorial(15). The equivalent implementation in Julia takes around 20 nanoseconds to execute, not including the compilation time. As a good rule of thumb, compilation can likely improve performance by around a factor of $10$ to $100$ , depending on the code being executed. There are exceptions to this rule, which will become clear by the end of the module.

Low-level vs High-level

When discussing different programming languages, one distinction that is often made is whether a language is high-level or low-level. These descriptors refer to the level of abstraction from machine code. The lower level a language is, the closer to the machine code that the CPU runs. The vast majority of programming languages used today are considered high level languages, as they are often written in human-readable source code, that requires converting into machine code before it can be executed. However, these distinctions are always relative. Many people refer to the C programming language as low-level, as it requires the programmer to manually handle many operations that are usually taken care of in higher-level languages like Julia or Python (i.e. memory management). Overall, how high or low level a language is just depends on the level of abstraction away from the machine code that will eventually be run.

Footnotes

You can open up a REPL session when opening julia in the command line. You should be able to copy and paste any code snippets within these notes to try it for yourself.↩

Back Next

High Performance Computing in Julia

1. Introduction

2. Foundations

3. Julia

4. Measuring Performance

5. Optimisation

6. Parallel Programming

7. Multithreading

8. GPU Programming

Table of Contents

Compilers and Machine Code

Ahead of Time Static Compilation

Just-in-Time Compilation

Interpreters

Low-level vs High-level

Footnotes