Type Stability
While Julia is a dynamic language, it can be very important for the compiler to know the type of the variables being used, so that it can specialise on it. We have been alluding to this throughout the book so far, but getting this wrong may have huge performance impacts. Let’s get a concrete example:
function get_fibonacci(n)
a = 1
b = 1
fibonacci_nums = []
push!(fibonacci_nums, a)
push!(fibonacci_nums, b)
for i = 1:n-2
a, b = b, a+b
push!(fibonacci_nums, b)
end
return fibonacci_nums
end
This calculates the first Fibonacci numbers and stores the results in an array. Here, we get the result of the vector being of type Vector{Any}
. Let’s see what effect this has on performance:
fibonacci_nums = get_fibonacci(50);
typeof(fibonacci_nums)
import BenchmarkTools: @benchmark, @btime, @belapsed
display(@benchmark sum($fibonacci_nums))
BenchmarkTools.Trial: 10000 samples with 196 evaluations per sample.
Range (min … max): 471.689 ns … 4.580 μs ┊ GC (min … max): 0.00% … 85.51%
Time (median): 511.337 ns ┊ GC (median): 0.00%
Time (mean ± σ): 517.249 ns ± 80.263 ns ┊ GC (mean ± σ): 0.97% ± 4.55%
█▃ ▁▇
▂▂▂▂▂▂▂▃▃▃██▅██▇▅▅▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▁▂▁▁▁▁▂▁▂▁▂▂▂▂ ▃
472 ns Histogram: frequency by time 645 ns <
Memory estimate: 608 bytes, allocs estimate: 38.
We can see that this sum had many allocations, and took a significant amount of time. If we simply make another array with the correct type, let us see the difference in performance:
fibonacci_nums_with_type = [f for f in fibonacci_nums];
typeof(fibonacci_nums_with_type)
display(@benchmark sum($fibonacci_nums_with_type))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.597 ns … 9.152 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.928 ns ┊ GC (median): 0.00%
Time (mean ± σ): 4.938 ns ± 0.172 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▅█▅▂ ▂▃▃▁
▂▃▄▄▄▃▃▂▂▂▂▂▂▂▂▂▁▂▂▂▂▃▃▃▃▂▂▂▂▃▄█████▅▃▃▂▃▅████▇▄▃▂▂▂▂▂▂▂▂ ▃
4.6 ns Histogram: frequency by time 5.14 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Notice that the allocations were reduced by orders of magnitude. Additionally, the runtime of these algorithms improved dramatically. In this example, when Julia knew the type of the variables, the performance increased by a factor of almost 120. The reason why performance was so low for the first method, is that Julia had to check the type of each individual element in the array to make sure that it was the right type.
Understanding Type Instability
If the compiler cannot trace the types of the variables used throughout a function (each variable having a known concrete type at compile time), then the function is said to be type unstable. This means that the compiler will have to perform additional checks on variables with unknown types at runtime, causes performance hits.
Let’s take a look at an example of a type unstable function:
function example_type_unstable_fn(x)
s = 0
for x_i in x
s += x_i
end
s
end
This looks like a normal implementation. We can even run it and benchmark to see if it works:
x = collect(1:100);
display(@benchmark example_type_unstable_fn($x))
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 6.375 ns … 15.484 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.424 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.434 ns ± 0.151 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▃ ▅▅▆▁▇▇█ ██ ▅▅▃ ▁▁
▂▂▁▂▂▂▂▂▃▃▃▃▄▅▄▆▇█▆██▇██████████████▆██▅▇▆▅▃▄▄▃▂▃▃▂▂▂▂▂▂▂▂ ▄
6.38 ns Histogram: frequency by time 6.48 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
There are actually no issues with this code so far that we can see, but let’s show an example where some performance issues come out:
function calc_summary(rows)
return sum(example_type_unstable_fn, rows)
end
rows = [rand(rand(0:10)) for _ in 1:100];
display(@benchmark calc_summary($rows))
BenchmarkTools.Trial: 10000 samples with 759 evaluations per sample.
Range (min … max): 165.705 ns … 205.926 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 170.147 ns ┊ GC (median): 0.00%
Time (mean ± σ): 170.480 ns ± 2.322 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▁▂ ▃█▅▂ ▇ ▃▇▆▅▄ ▁▁ ▄ ▁
▁▁▁▁▃▅▄▅▅▃▆▅▆████████▇████████▆▆███▇█▅▆▇▇██▇▆▅█▇▇▅▆▅▃▃▂▃▂▂▂▂▂ ▅
166 ns Histogram: frequency by time 176 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
We can use the @code_warntype
macro to identify type instabilities:
@code_warntype example_type_unstable_fn(rows[1])
MethodInstance for example_type_unstable_fn(::Vector{Float64})
from example_type_unstable_fn(x) @ Main ~/dev/website-dev/personal-website/.code-exec/_6bf4d0d0b89c736c9f6160f9067cccc477b65204f891a44b7fdda59e074ed8a0.jl:26
Arguments
#self#::Core.Const(Main.example_type_unstable_fn)
x::Vector{Float64}
Locals
@_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
s::UNION{FLOAT64, INT64}
x_i::Float64
Body::UNION{FLOAT64, INT64}
1 ─ (s = 0)
│ %2 = x::Vector{Float64}
│ (@_3 = Base.iterate(%2))
│ %4 = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│ %5 = (%4 === nothing)::Bool
│ %6 = Base.not_int(%5)::Bool
└── goto #4 if not %6
2 ┄ %8 = @_3::Tuple{Float64, Int64}
│ (x_i = Core.getfield(%8, 1))
│ %10 = Core.getfield(%8, 2)::Int64
│ %11 = s::UNION{FLOAT64, INT64}
│ %12 = x_i::Float64
│ (s = %11 + %12)
│ (@_3 = Base.iterate(%2, %10))
│ %15 = @_3::UNION{NOTHING, TUPLE{FLOAT64, INT64}}
│ %16 = (%15 === nothing)::Bool
│ %17 = Base.not_int(%16)::Bool
└── goto #4 if not %17
3 ─ goto #2
4 ┄ %20 = s::UNION{FLOAT64, INT64}
└── return %20
We can see that the variable s
is type unstable (i.e. has a union type). This is because the input array has the possibility of being empty. If this array is empty then the loop gets skipped and the function will return 0, which is an integer. If the array is not empty, then the first assignment to s
will promote the type to a floating point number.
We can fix this error by using the zero
and eltype
functions:
function example_type_stable_fn(x)
s = zero(eltype(x))
for x_i in x
s += x_i
end
s
end
function calc_summary_stable(rows)
return sum(example_type_stable_fn, rows)
end
Now let’s benchmark both versions:
display(@benchmark calc_summary($rows))
BenchmarkTools.Trial: 10000 samples with 755 evaluations per sample.
Range (min … max): 165.983 ns … 231.140 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 169.434 ns ┊ GC (median): 0.00%
Time (mean ± σ): 169.792 ns ± 2.290 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▁▂▄█▄▃ █▁▁ ▃▂
▁▁▃▄▃▄▆▅▇▆██████████████████▆▅▄▅▅▄▄▄▄▃▃▂▂▃▂▃▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁ ▃
166 ns Histogram: frequency by time 177 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
display(@benchmark calc_summary_stable($rows))
BenchmarkTools.Trial: 10000 samples with 867 evaluations per sample.
Range (min … max): 138.367 ns … 185.163 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 149.389 ns ┊ GC (median): 0.00%
Time (mean ± σ): 149.611 ns ± 3.572 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▂▂▄▃▃▇▄█▅▇▇▅▅▅▅▃▆▃▂ ▁
▂▂▂▁▂▂▂▂▂▃▃▄▄▄▄▄▄▄▆████████████████████████▇▇▆▆▅▆▅▄▄▄▄▃▃▄▃▃▃▃ ▅
138 ns Histogram: frequency by time 159 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Making our code type stable ensures our code is very generic and reusable across many types of numbers. In general, it is a good idea to avoid constants in your code, and instead use the zero
, one
, eltype
and typeof
functions to construct your constants.
Barriers
There are some occasions when you cannot predict the type being returned from a function, which leads to type instability. There are still some tools one can use to mitigate the performance hits of this. The antidote to this problem is to break up larger functions into several smaller functions, which act as type barriers so that performance critical parts of your code are not affected.
Let’s take the example of using an array to store parameters which are used within a function:
function params_in_array_test(x)
a = 0
b = 1
for i = 1:100
c = x[1] + a
d = x[2] / b
a, b = c, d
end
return a, b
end
x = [2.0, 1]
display(@benchmark params_in_array_test($x))
BenchmarkTools.Trial: 10000 samples with 958 evaluations per sample.
Range (min … max): 88.868 ns … 115.498 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 89.403 ns ┊ GC (median): 0.00%
Time (mean ± σ): 89.813 ns ± 1.277 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂▅▇███▇▅▄▁ ▁▂▃▃▃▂▁▂▁ ▁▂▃▃▂▁ ▃
▇██████████▆▆▆█████████████▆▅▄▅▆▅▆▆▅▄▃▅▃▅▆▃▆▅▄▅▄▃▃▁▃▄▇██████ █
88.9 ns Histogram: log(frequency) by time 94.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
The compiler has “promoted” the integer - 1 - to a floating point number, so that the numbers in the array can match. Now, let’s imagine that we add another parameter to this array, which is of a different type:
x = [2.0, 1, "third"]
display(@benchmark params_in_array_test($x))
BenchmarkTools.Trial: 10000 samples with 9 evaluations per sample.
Range (min … max): 2.341 μs … 49.739 μs ┊ GC (min … max): 0.00% … 92.81%
Time (median): 2.504 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.606 μs ± 1.287 μs ┊ GC (mean ± σ): 1.85% ± 3.58%
▂▃▄▆███▇▅▃▁▁▁▁▁▁▁ ▁▁▁ ▂
▅███████████████████▇███▇▇▆▆▅▆▅▇▇██████▇▆▄▅▅▂▄▅▅▄▄▅▅▅▂▃▂▄▄ █
2.34 μs Histogram: log(frequency) by time 3.68 μs <
Memory estimate: 3.16 KiB, allocs estimate: 201.
Now, even though the two parts of the x
array are unchanged, all we did was add a third parameter, we suddenly have a significant decrease in speed! This is because x
now has to be an array of type Any
.
We can remedy this hit, without changing the input types, by using a barrier function:
function _params_in_array_test_loop(x1, x2, a, b)
for i = 1:100
c = x1+a
d = x2 / b
a, b = c, d
end
return a, b
end
function params_in_array_test_with_barrier(x)
a = 0
b = 1
return _params_in_array_test_loop(x[1], x[2], a, b)
end
display(@benchmark params_in_array_test_with_barrier($x))
BenchmarkTools.Trial: 10000 samples with 401 evaluations per sample.
Range (min … max): 241.975 ns … 2.344 μs ┊ GC (min … max): 0.00% … 87.83%
Time (median): 242.082 ns ┊ GC (median): 0.00%
Time (mean ± σ): 244.665 ns ± 30.188 ns ┊ GC (mean ± σ): 0.17% ± 1.24%
█ ▄▂ ▁ ▅ ▁
█▆▆▆▇███▇█▆▆▆▆▆▆▅▅▆█▅▅██▅▅▅██▆▅▆▆▅▅▅▄▅▅▅▅▅▅▄▅▆▄▃▃▁▄▃▄▅▅▄▄▅▄▅ █
242 ns Histogram: log(frequency) by time 271 ns <
Memory estimate: 32 bytes, allocs estimate: 1.
This refactor allows the bulk of the processing to occur in a type stable manner, avoiding the performance hit when it is not possible to completely remove type instabilities.
Using Composite Types
In Julia, we can define our own composite types using struct
. By default, structs are immutable. Structs are a common place that people introduce type instability. Take the example of defining one’s own complex number:
struct MyComplex
real
imag
end
However, this type is not type stable. When we defined the type previously, we implicitly labelled both the real
and imag
variables as type Any
. We can fix this by specifying a type:
struct MyComplexFixed
real::Float64
imag::Float64
end
However now, this type will only work with type Float64
. Instead, we can use a feature of Julia called generics, which allows us to specify the type of the parameters inside a struct:
struct MyComplexGeneric{T<:Number}
real::T
imag::T
end
# Example usage
c1 = MyComplexGeneric(1.0, 2.0)
c2 = MyComplexGeneric(1, 2)
If you have a performance critical struct, make sure that the types are well-defined. If the struct is a collection of numeric types, the resulting composition will also be a numeric type and hence allowed to be stack allocated.