JM

Type Stability

While Julia is a dynamic language, it can be very important for the compiler to know the type of the variables being used, so that it can specialise on it. We have been alluding to this throughout the book so far, but getting this wrong may have huge performance impacts. Let’s get a concrete example:

julia
function get_fibonacci(n)
    a = 1
    b = 1
    fibonacci_nums = []
    push!(fibonacci_nums, a)
    push!(fibonacci_nums, b)
    for i = 1:n-2
        a, b = b, a+b
        push!(fibonacci_nums, b)
    end
    return fibonacci_nums
end

This calculates the first n n Fibonacci numbers and stores the results in an array. Here, we get the result of the vector being of type Vector{Any}. Let’s see what effect this has on performance:

julia
fibonacci_nums = get_fibonacci(50);
typeof(fibonacci_nums)
julia
import BenchmarkTools: @benchmark, @btime, @belapsed
display(@benchmark sum($fibonacci_nums))
Output
BenchmarkTools.Trial: 10000 samples with 152 evaluations per sample.
 Range (min … max):  671.711 ns …   9.641 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     748.684 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   795.683 ns ± 224.152 ns  ┊ GC (mean ± σ):  1.00% ± 4.44%

   ▁▅▅▇██▆▅▅▅▅▄▃▂▂▂▂▂ ▁▁                                        ▂
  ▂█████████████████████▇▆▆▇▆▆▄▃▆▅▅▃▅▅▇▄▄▇▇▆▄▅▅▇█▇▆▅▅▇▅▅▅▇▃█▅▅▅ █
  672 ns        Histogram: log(frequency) by time       1.41 μs <

 Memory estimate: 608 bytes, allocs estimate: 38.

We can see that this sum had many allocations, and took a significant amount of time. If we simply make another array with the correct type, let us see the difference in performance:

julia
fibonacci_nums_with_type = [f for f in fibonacci_nums];
typeof(fibonacci_nums_with_type)
julia
display(@benchmark sum($fibonacci_nums_with_type))
Output
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  4.100 ns … 23.200 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.900 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.855 ns ±  0.474 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▅               █  ▄     ▁                          
  ▂▁▄▁▁▇▁▁█▁▁█▁█▁▁▅▁▁█▁▁▅▁█▁▁█▁▁▇▁▁█▁▂▁▁▃▁▁▂▁▁▂▁▂▁▁▂▁▁▂▁▁▂▁▂ ▃
  4.1 ns         Histogram: frequency by time         6.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Notice that the allocations were reduced by orders of magnitude. Additionally, the runtime of these algorithms improved dramatically. In this example, when Julia knew the type of the variables, the performance increased by a factor of almost 120. The reason why performance was so low for the first method, is that Julia had to check the type of each individual element in the array to make sure that it was the right type.

Understanding Type Instability

If the compiler cannot trace the types of the variables used throughout a function (each variable having a known concrete type at compile time), then the function is said to be type unstable. This means that the compiler will have to perform additional checks on variables with unknown types at runtime, causes performance hits.

Let’s take a look at an example of a type unstable function:

julia
function example_type_unstable_fn(x)
    s = 0
    for x_i in x
        s += x_i
    end
    s
end

This looks like a normal implementation. We can even run it and benchmark to see if it works:

julia
x = collect(1:100);
display(@benchmark example_type_unstable_fn($x))
Output
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  5.200 ns … 185.300 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.600 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.693 ns ±   2.452 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▅    █   ▄             ▅   ▆    ▂                        
  ▅▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁▇▁▁▁█▁▁▁▁█▁▁▁█▁▁▁▁█▁▁▁▂▁▁▁▁▂▁▁▁▂▁▁▁▁▂▁▁▁▂ ▃
  5.2 ns          Histogram: frequency by time         6.5 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

There are actually no issues with this code so far that we can see, but let’s show an example where some performance issues come out:

julia
function calc_summary(rows)
    return sum(example_type_unstable_fn, rows)
end

rows = [rand(rand(0:10)) for _ in 1:100];
display(@benchmark calc_summary($rows))
Output
BenchmarkTools.Trial: 10000 samples with 737 evaluations per sample.
 Range (min … max):  172.999 ns … 489.552 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     181.954 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   183.177 ns ±   8.747 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

        ▁▁▃▂▇▅▅▃  ▂▆█▁▃▁             ▁                           
  ▁▂▄▄▆▇█████████▇██████▅▅▆▅▄▄▅▄▅▅▄▇▆█▅▄▂▃▂▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▄
  173 ns           Histogram: frequency by time          204 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

We can use the @code_warntype macro to identify type instabilities:

julia
@code_warntype example_type_unstable_fn(rows[1])
Output
MethodInstance for example_type_unstable_fn(::Vector{Float64})
  from example_type_unstable_fn(x) @ Main C:devwebsite-devpersonal-website.code-exec_6bf4d0d0b89c736c9f6160f9067cccc477b65204f891a44b7fdda59e074ed8a0.jl:26
Arguments
  #self#::Core.Const(Main.example_type_unstable_fn)
  x::Vector{Float64}
Locals
  @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
  s::UNION{FLOAT64, INT64}
  x_i::Float64
Body::UNION{FLOAT64, INT64}
1 ─       (s = 0)
│   %2  = x::Vector{Float64}
│         (@_3 = Base.iterate(%2))
│   %4  = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│   %5  = (%4 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Float64, Int64}
│         (x_i = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = s::UNION{FLOAT64, INT64}
│   %12 = x_i::Float64
│         (s = %11 + %12)
│         (@_3 = Base.iterate(%2, %10))
│   %15 = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│   %16 = (%15 === nothing)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄ %20 = s::UNION{FLOAT64, INT64}
└──       return %20

We can see that the variable s is type unstable (i.e. has a union type). This is because the input array has the possibility of being empty. If this array is empty then the loop gets skipped and the function will return 0, which is an integer. If the array is not empty, then the first assignment to s will promote the type to a floating point number.

We can fix this error by using the zero and eltype functions:

julia
function example_type_stable_fn(x)
    s = zero(eltype(x))
    for x_i in x
        s += x_i
    end
    s
end

function calc_summary_stable(rows)
    return sum(example_type_stable_fn, rows)
end

Now let’s benchmark both versions:

julia
display(@benchmark calc_summary($rows))
Output
BenchmarkTools.Trial: 10000 samples with 732 evaluations per sample.
 Range (min … max):  174.590 ns … 702.322 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     181.011 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   192.386 ns ±  44.954 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▅█▆▅▄▂▁                                            ▁          ▁
  ███████▇▆▅▆▄▅▄▅▆▅▅▅▅▆▆▆▇▅▃▄▄▅▄▃▃▃▃▃▅▄▄▄▃▃▅▅▅▃▅▅▅▄▅▇█▆▆▅▆▅▅▅▅▄ █
  175 ns        Histogram: log(frequency) by time        405 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
julia
display(@benchmark calc_summary_stable($rows))
Output
BenchmarkTools.Trial: 10000 samples with 570 evaluations per sample.
 Range (min … max):  204.035 ns …   1.539 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     384.561 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   333.648 ns ± 112.250 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▇▆▃▁  ▁             ▁█▇▅▄▃▃▂                                 ▂
  █████▇███▇▆▄▅▄▄▅▄▃▄▅▄███████████▆▇▇▇▆▆▆▆▇▇▆▆▆▆▆▅▅▆▆▆▄▄▆▅▅▅▄▅▄ █
  204 ns        Histogram: log(frequency) by time        698 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Making our code type stable ensures our code is very generic and reusable across many types of numbers. In general, it is a good idea to avoid constants in your code, and instead use the zero, one, eltype and typeof functions to construct your constants.

Barriers

There are some occasions when you cannot predict the type being returned from a function, which leads to type instability. There are still some tools one can use to mitigate the performance hits of this. The antidote to this problem is to break up larger functions into several smaller functions, which act as type barriers so that performance critical parts of your code are not affected.

Let’s take the example of using an array to store parameters which are used within a function:

julia
function params_in_array_test(x)
    a = 0
    b = 1
    for i = 1:100
        c = x[1] + a
        d = x[2] / b
        a, b = c, d
    end
    return a, b
end

x = [2.0, 1]
display(@benchmark params_in_array_test($x))
Output
BenchmarkTools.Trial: 10000 samples with 956 evaluations per sample.
 Range (min … max):   89.017 ns … 800.314 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      92.887 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   100.729 ns ±  31.113 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁█▆▄▃▁                                                    ▁   ▁
  ████████▇▇▇▆▅▅▆▅▆▆▅▅▅▄▅▄▅▅▄▅▃▄▅▅▅▄▄▅▅▄▄▅▄▄▅▄▅▄▃▄▅▅▅▃▃▄▃▃▅▂█▄▅ █
  89 ns         Histogram: log(frequency) by time        250 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The compiler has “promoted” the integer - 1 - to a floating point number, so that the numbers in the array can match. Now, let’s imagine that we add another parameter to this array, which is of a different type:

julia
x = [2.0, 1, "third"]
display(@benchmark params_in_array_test($x))
Output
BenchmarkTools.Trial: 10000 samples with 8 evaluations per sample.
 Range (min … max):  3.000 μs … 164.950 μs  ┊ GC (min … max): 0.00% … 94.80%
 Time  (median):     3.438 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.101 μs ±   3.604 μs  ┊ GC (mean ± σ):  1.80% ±  2.87%

  ▆█▆▄▃▁▁▁ ▄▃▂▁▁                                              ▂
  ████████▇███████▆▆▆▅▅▆▄▅▅▅▅▄▅▄▄▅▅▅▅▃▅▅▁▄▃▁▃▃▄▄▄▁▅▄▄▅▃▃▁▄▄▁▄ █
  3 μs         Histogram: log(frequency) by time      18.6 μs <

 Memory estimate: 3.16 KiB, allocs estimate: 201.

Now, even though the two parts of the x array are unchanged, all we did was add a third parameter, we suddenly have a significant decrease in speed! This is because x now has to be an array of type Any.

We can remedy this hit, without changing the input types, by using a barrier function:

julia
function _params_in_array_test_loop(x1, x2, a, b)
    for i = 1:100
        c = x1+a
        d = x2 / b
        a, b = c, d
    end
    return a, b
end

function params_in_array_test_with_barrier(x)
    a = 0
    b = 1
    return _params_in_array_test_loop(x[1], x[2], a, b)
end

display(@benchmark params_in_array_test_with_barrier($x))
Output
BenchmarkTools.Trial: 10000 samples with 372 evaluations per sample.
 Range (min … max):  241.935 ns …   4.871 μs  ┊ GC (min … max): 0.00% … 91.56%
 Time  (median):     256.720 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   293.948 ns ± 100.543 ns  ┊ GC (mean ± σ):  0.27% ±  1.30%

  ▁█▇▆▄▂▁         ▆ ▃▂▂▁                                        ▁
  ████████▇▇▇▆▅▆▅▆█▇██████▇▇▆▅▅▅▄▆▅▅▅▅▄▅▅▅▅▆▅▅▅▅▄▄▅▆▅▅▅▄▄▅▅▆▅▃▅ █
  242 ns        Histogram: log(frequency) by time        662 ns <

 Memory estimate: 32 bytes, allocs estimate: 1.

This refactor allows the bulk of the processing to occur in a type stable manner, avoiding the performance hit when it is not possible to completely remove type instabilities.

Using Composite Types

In Julia, we can define our own composite types using struct. By default, structs are immutable. Structs are a common place that people introduce type instability. Take the example of defining one’s own complex number:

julia
struct MyComplex
    real
    imag
end

However, this type is not type stable. When we defined the type previously, we implicitly labelled both the real and imag variables as type Any. We can fix this by specifying a type:

julia
struct MyComplexFixed
    real::Float64
    imag::Float64
end

However now, this type will only work with type Float64. Instead, we can use a feature of Julia called generics, which allows us to specify the type of the parameters inside a struct:

julia
struct MyComplexGeneric{T<:Number}
    real::T
    imag::T
end

# Example usage
c1 = MyComplexGeneric(1.0, 2.0)
c2 = MyComplexGeneric(1, 2)

If you have a performance critical struct, make sure that the types are well-defined. If the struct is a collection of numeric types, the resulting composition will also be a numeric type and hence allowed to be stack allocated.