JM

Type Stability

While Julia is a dynamic language, it can be very important for the compiler to know the type of the variables being used, so that it can specialise on it. We have been alluding to this throughout the book so far, but getting this wrong may have huge performance impacts. Let’s get a concrete example:

julia
function get_fibonacci(n)
    a = 1
    b = 1
    fibonacci_nums = []
    push!(fibonacci_nums, a)
    push!(fibonacci_nums, b)
    for i = 1:n-2
        a, b = b, a+b
        push!(fibonacci_nums, b)
    end
    return fibonacci_nums
end

This calculates the first n n Fibonacci numbers and stores the results in an array. Here, we get the result of the vector being of type Vector{Any}. Let’s see what effect this has on performance:

julia
fibonacci_nums = get_fibonacci(50);
typeof(fibonacci_nums)
julia
import BenchmarkTools: @benchmark, @btime, @belapsed
display(@benchmark sum($fibonacci_nums))
Output
BenchmarkTools.Trial: 10000 samples with 196 evaluations per sample.
 Range (min … max):  471.689 ns …  4.580 μs  ┊ GC (min … max): 0.00% … 85.51%
 Time  (median):     511.337 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   517.249 ns ± 80.263 ns  ┊ GC (mean ± σ):  0.97% ±  4.55%

            █▃ ▁▇                                               
  ▂▂▂▂▂▂▂▃▃▃██▅██▇▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▁▂▁▁▁▁▂▁▂▁▂▂▂▂ ▃
  472 ns          Histogram: frequency by time          645 ns <

 Memory estimate: 608 bytes, allocs estimate: 38.

We can see that this sum had many allocations, and took a significant amount of time. If we simply make another array with the correct type, let us see the difference in performance:

julia
fibonacci_nums_with_type = [f for f in fibonacci_nums];
typeof(fibonacci_nums_with_type)
julia
display(@benchmark sum($fibonacci_nums_with_type))
Output
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  4.597 ns … 9.152 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.928 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.938 ns ± 0.172 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                 ▁▅█▅▂      ▂▃▃▁             
  ▂▃▄▄▄▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▂▃▃▃▃▂▂▂▂▃▄█████▅▃▃▂▃▅████▇▄▃▂▂▂▂▂▂▂▂ ▃
  4.6 ns         Histogram: frequency by time       5.14 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Notice that the allocations were reduced by orders of magnitude. Additionally, the runtime of these algorithms improved dramatically. In this example, when Julia knew the type of the variables, the performance increased by a factor of almost 120. The reason why performance was so low for the first method, is that Julia had to check the type of each individual element in the array to make sure that it was the right type.

Understanding Type Instability

If the compiler cannot trace the types of the variables used throughout a function (each variable having a known concrete type at compile time), then the function is said to be type unstable. This means that the compiler will have to perform additional checks on variables with unknown types at runtime, causes performance hits.

Let’s take a look at an example of a type unstable function:

julia
function example_type_unstable_fn(x)
    s = 0
    for x_i in x
        s += x_i
    end
    s
end

This looks like a normal implementation. We can even run it and benchmark to see if it works:

julia
x = collect(1:100);
display(@benchmark example_type_unstable_fn($x))
Output
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
 Range (min … max):  6.375 ns … 15.484 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.424 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.434 ns ±  0.151 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                     ▁▃ ▅▅▆▁▇▇█ ██ ▅▅▃ ▁▁                     
  ▂▂▁▂▂▂▂▂▃▃▃▃▄▅▄▆▇█▆██▇██████████████▆██▅▇▆▅▃▄▄▃▂▃▃▂▂▂▂▂▂▂▂ ▄
  6.38 ns        Histogram: frequency by time        6.48 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

There are actually no issues with this code so far that we can see, but let’s show an example where some performance issues come out:

julia
function calc_summary(rows)
    return sum(example_type_unstable_fn, rows)
end

rows = [rand(rand(0:10)) for _ in 1:100];
display(@benchmark calc_summary($rows))
Output
BenchmarkTools.Trial: 10000 samples with 759 evaluations per sample.
 Range (min … max):  165.705 ns … 205.926 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     170.147 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   170.480 ns ±   2.322 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

               ▂▁▂ ▃█▅▂ ▇ ▃▇▆▅▄   ▁▁       ▄    ▁                
  ▁▁▁▁▃▅▄▅▅▃▆▅▆████████▇████████▆▆███▇█▅▆▇▇██▇▆▅█▇▇▅▆▅▃▃▂▃▂▂▂▂▂ ▅
  166 ns           Histogram: frequency by time          176 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

We can use the @code_warntype macro to identify type instabilities:

julia
@code_warntype example_type_unstable_fn(rows[1])
Output
MethodInstance for example_type_unstable_fn(::Vector{Float64})
  from example_type_unstable_fn(x) @ Main ~/dev/website-dev/personal-website/.code-exec/_6bf4d0d0b89c736c9f6160f9067cccc477b65204f891a44b7fdda59e074ed8a0.jl:26
Arguments
  #self#::Core.Const(Main.example_type_unstable_fn)
  x::Vector{Float64}
Locals
  @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
  s::UNION{FLOAT64, INT64}
  x_i::Float64
Body::UNION{FLOAT64, INT64}
1 ─       (s = 0)
│   %2  = x::Vector{Float64}
│         (@_3 = Base.iterate(%2))
│   %4  = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│   %5  = (%4 === nothing)::Bool
│   %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Float64, Int64}
│         (x_i = Core.getfield(%8, 1))
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = s::UNION{FLOAT64, INT64}
│   %12 = x_i::Float64
│         (s = %11 + %12)
│         (@_3 = Base.iterate(%2, %10))
│   %15 = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│   %16 = (%15 === nothing)::Bool
│   %17 = Base.not_int(%16)::Bool
└──       goto #4 if not %17
3 ─       goto #2
4 ┄ %20 = s::UNION{FLOAT64, INT64}
└──       return %20

We can see that the variable s is type unstable (i.e. has a union type). This is because the input array has the possibility of being empty. If this array is empty then the loop gets skipped and the function will return 0, which is an integer. If the array is not empty, then the first assignment to s will promote the type to a floating point number.

We can fix this error by using the zero and eltype functions:

julia
function example_type_stable_fn(x)
    s = zero(eltype(x))
    for x_i in x
        s += x_i
    end
    s
end

function calc_summary_stable(rows)
    return sum(example_type_stable_fn, rows)
end

Now let’s benchmark both versions:

julia
display(@benchmark calc_summary($rows))
Output
BenchmarkTools.Trial: 10000 samples with 755 evaluations per sample.
 Range (min … max):  165.983 ns … 231.140 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     169.434 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   169.792 ns ±   2.290 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▁▂▄█▄▃ █▁▁ ▃▂                                      
  ▁▁▃▄▃▄▆▅▇▆██████████████████▆▅▄▅▅▄▄▄▄▃▃▂▂▃▂▃▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁ ▃
  166 ns           Histogram: frequency by time          177 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
julia
display(@benchmark calc_summary_stable($rows))
Output
BenchmarkTools.Trial: 10000 samples with 867 evaluations per sample.
 Range (min … max):  138.367 ns … 185.163 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     149.389 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   149.611 ns ±   3.572 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                       ▁▂▂▄▃▃▇▄█▅▇▇▅▅▅▅▃▆▃▂ ▁                    
  ▂▂▂▁▂▂▂▂▂▃▃▄▄▄▄▄▄▄▆████████████████████████▇▇▆▆▅▆▅▄▄▄▄▃▃▄▃▃▃▃ ▅
  138 ns           Histogram: frequency by time          159 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Making our code type stable ensures our code is very generic and reusable across many types of numbers. In general, it is a good idea to avoid constants in your code, and instead use the zero, one, eltype and typeof functions to construct your constants.

Barriers

There are some occasions when you cannot predict the type being returned from a function, which leads to type instability. There are still some tools one can use to mitigate the performance hits of this. The antidote to this problem is to break up larger functions into several smaller functions, which act as type barriers so that performance critical parts of your code are not affected.

Let’s take the example of using an array to store parameters which are used within a function:

julia
function params_in_array_test(x)
    a = 0
    b = 1
    for i = 1:100
        c = x[1] + a
        d = x[2] / b
        a, b = c, d
    end
    return a, b
end

x = [2.0, 1]
display(@benchmark params_in_array_test($x))
Output
BenchmarkTools.Trial: 10000 samples with 958 evaluations per sample.
 Range (min … max):  88.868 ns … 115.498 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     89.403 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   89.813 ns ±   1.277 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂▅▇███▇▅▄▁    ▁▂▃▃▃▂▁▂▁                              ▁▂▃▃▂▁ ▃
  ▇██████████▆▆▆█████████████▆▅▄▅▆▅▆▆▅▄▃▅▃▅▆▃▆▅▄▅▄▃▃▁▃▄▇██████ █
  88.9 ns       Histogram: log(frequency) by time      94.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

The compiler has “promoted” the integer - 1 - to a floating point number, so that the numbers in the array can match. Now, let’s imagine that we add another parameter to this array, which is of a different type:

julia
x = [2.0, 1, "third"]
display(@benchmark params_in_array_test($x))
Output
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
 Range (min … max):  2.341 μs … 49.739 μs  ┊ GC (min … max): 0.00% … 92.81%
 Time  (median):     2.504 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.606 μs ±  1.287 μs  ┊ GC (mean ± σ):  1.85% ±  3.58%

   ▂▃▄▆███▇▅▃▁▁▁▁▁▁▁                ▁▁▁                      ▂
  ▅███████████████████▇███▇▇▆▆▅▆▅▇▇██████▇▆▄▅▅▂▄▅▅▄▄▅▅▅▂▃▂▄▄ █
  2.34 μs      Histogram: log(frequency) by time     3.68 μs <

 Memory estimate: 3.16 KiB, allocs estimate: 201.

Now, even though the two parts of the x array are unchanged, all we did was add a third parameter, we suddenly have a significant decrease in speed! This is because x now has to be an array of type Any.

We can remedy this hit, without changing the input types, by using a barrier function:

julia
function _params_in_array_test_loop(x1, x2, a, b)
    for i = 1:100
        c = x1+a
        d = x2 / b
        a, b = c, d
    end
    return a, b
end

function params_in_array_test_with_barrier(x)
    a = 0
    b = 1
    return _params_in_array_test_loop(x[1], x[2], a, b)
end

display(@benchmark params_in_array_test_with_barrier($x))
Output
BenchmarkTools.Trial: 10000 samples with 401 evaluations per sample.
 Range (min … max):  241.975 ns …  2.344 μs  ┊ GC (min … max): 0.00% … 87.83%
 Time  (median):     242.082 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   244.665 ns ± 30.188 ns  ┊ GC (mean ± σ):  0.17% ±  1.24%

  █    ▄▂  ▁            ▅                                      ▁
  █▆▆▆▇███▇█▆▆▆▆▆▆▅▅▆█▅▅██▅▅▅██▆▅▆▆▅▅▅▄▅▅▅▅▅▅▄▅▆▄▃▃▁▄▃▄▅▅▄▄▅▄▅ █
  242 ns        Histogram: log(frequency) by time       271 ns <

 Memory estimate: 32 bytes, allocs estimate: 1.

This refactor allows the bulk of the processing to occur in a type stable manner, avoiding the performance hit when it is not possible to completely remove type instabilities.

Using Composite Types

In Julia, we can define our own composite types using struct. By default, structs are immutable. Structs are a common place that people introduce type instability. Take the example of defining one’s own complex number:

julia
struct MyComplex
    real
    imag
end

However, this type is not type stable. When we defined the type previously, we implicitly labelled both the real and imag variables as type Any. We can fix this by specifying a type:

julia
struct MyComplexFixed
    real::Float64
    imag::Float64
end

However now, this type will only work with type Float64. Instead, we can use a feature of Julia called generics, which allows us to specify the type of the parameters inside a struct:

julia
struct MyComplexGeneric{T<:Number}
    real::T
    imag::T
end

# Example usage
c1 = MyComplexGeneric(1.0, 2.0)
c2 = MyComplexGeneric(1, 2)

If you have a performance critical struct, make sure that the types are well-defined. If the struct is a collection of numeric types, the resulting composition will also be a numeric type and hence allowed to be stack allocated.