Futhark 0.16.1 released

Posted on July 7, 2020 by Troels Henriksen

Futhark 0.16.1 has just been released (full changelog). There has actually been quite a number of releases since 0.15.1. We made it all the way to 0.15.8, but most were relatively small bugfix releases, in particular to patch up problems with the then-new size types. While not all of the seven releases were purely about bugfixes, none of them felt interesting enough that it was worth writing a release announcement. Well, 0.16.1 is different, as it finally enables incremental flattening, a novel technique for compiling nested parallelism that we have worked on for years. Its fundamental concept and implementation has been the same for over a year and a half, but it was only recently that we finally worked out the last major kinks and practical problems, so that we now feel we can offer it to users by default. The main remaining deficiency is an increase in compile times for some programs (as discussed previously), but this is just a price we will have to pay. I also expect we will be chasing bugs for a while, as incremental flattening produces code that is significantly more complex than what we had before.

I won’t talk much more about incremental flattening in this announcement - see the original blog post for that - but let’s look at a few of the other notable changes, fixes, and improvements we have made since 0.15.1. The total set of changes is vast, so I’m only going to mention those that others might find interesting to read about.

Incompatible changes

To get it out of the way, these are the incompatible changes that make this version 0.16.1 rather than 0.15.9:

The C# backend has been removed. It took nontrivial effort to maintain, held back general compiler improvements that required backend changes, and (most importantly) was not used by anyone. Instead of maintaining a lot of different backends, our strategy is now to recommend bridges that allow various languages to conveniently call code generated by Futhark’s various C backends. As part of this, we significantly improved the documentation. We are still going to keep the Python backend for its convenience, and we will probably also add a JavaScript or WebAssembly backend in the future.
The unsafe keyword has been replaced with the #[unsafe] attribute.
Futhark 0.15.8 added warnings for literals that are inferred to have a type for which they would be out-of-bounds, such as 300 : i8. These warnings are now errors, since I cannot think of any good reason to ever use an out-of-bounds literal. (For technical reasons, typed literals like 300i8 are still permitted, as pointed out by the person who actually implemented this.)
Type ascriptions on entry points now always result in opaque types in the C API when the underlying concrete type is a tuple. Previously, the compiler would try hard to exploit the tuple structure, with sometimes undesirable effects. Now things are a bit more rigid, but also more predictable.

Tightening causality

Size types mostly behave as one would expect, but when anonymous sizes are involved, things become more complex. An anonymous size occurs when a function returns an array of a size that cannot be known in advance, such as for example with filter. The most subtle issue is ensuring that when some size is needed, it has already been computed. This is ensured by the causality restriction, enforced by the type checker. The causality restriction is most fundamentally about ensuring that it is possible to execute the program one expression after another, without having to resort to time travel. However, the causality restriction is more rigid than this implies, as it enforces that a specific evaluation order (roughly left-to-right) will result in all sizes being computed before they are needed. The motivation is not just to ensure a simpler compiler, but also to make the rules easier to understand for programmers.

As an example of a causality violation, consider this function definition:

let f (b: bool) (xs: []i32) =
  let a = [] : [][]i32
  let b = [filter (>0) xs]
  in a[0] == b[0]

The comparison on the last line forces the row size of a and b to be the same, let’s say n. Further, while the empty array literal can be given any row size, that n must be the size of whatever array is produced by the filter. But now we have a problem: constructing the empty array requires us to know the specific value of n, but it is not computed until later! This is called a causality violation: we need a value before it is available.

The causality restriction has a particularly subtle interaction with higher-order functions, particularly the pipe operators <| and |>. Programmers familiar with other languages, in particular Haskell, may wish to use the <| operator frequently, due to its similarity to Haskell’s $ operator. Unfortunately, it has pitfalls due to causality. Consider this expression:

length <| filter (>0) [1,-2,3]

This is a causality violation. The reason is that length has the following type scheme:

val length [n] 't : [n]t -> i32

This means that whenever we use length, the type checker must instantiate the size variable n with some specific size, which must be available at the place length itself occurs. In the expression above, this specific size is whatever anonymous size variable the filter application produces. However, since the rule for binary operators is left-to-right evaluation, length function is instantiated (but not applied!) before the filter runs. The distinction between instantiation, which is when a polymorphic value is given its concrete type, and application, which is when a function is provided with an argument, is crucial here. The end result is that the compiler will complain:

> length <| filter (>0) [1,-2,3]
Error at [1]> :1:1-6:
Causality check: size "ret₁₁" needed for type of "length":
  [ret₁₁]i32 -> i32
But "ret₁₁" is computed at 1:11-30.
Hint: Bind the expression producing "ret₁₁" with 'let' beforehand.

The compiler suggests binding the filter expression with a let, which forces it to be evaluated first, but there are neater solutions in this case. For example, we can exploit that function arguments are evaluated before function is instantiated:

> length (filter (>0) [1,-2,3])
2i32

Or we can use the left-to-right piping operator:

> filter (>0) [1,-2,3] |> length
2i32

Due to a bug, this check was not done properly before 0.16.1, so the program above would type-check. Specifically, the type checker would not check that all sizes in a type could be instantiated with known sizes, only those that were used function parameters in certain complex ways. Compilation would still work in this specific case, but there were more complex cases where it would result in broken code.

Significantly better handling of constants

Top-level constants were always a bit of an afterthought in Futhark. Originally we didn’t have them at all, and you’d just write them as functions that took an empty tuple:

let physicists_pi () = 4

Even after the source language got support for proper constants, we still compiled them to functions in the core language, and replaced uses of the constant with a call to the corresponding function. This works fine as long as the constants are things like array literals or scalar operations that get constant-folded by the compiler anyway. But consider if we do something like this:

let expensive_constant = i32.sum (iota 1000000000)

let f (x: i32) =
  x * expensive_constant + expensive_constant

Under the old compilation scheme, this would look roughly like this in the compiler IR:

let expensive_constant () = i32.sum (iota 1000000000)

let f (x: i32) =
  x * expensive_constant () + expensive_constant ()

Now suddenly that expensive constant is evaluated twice. Since it really is a constant, one might expect that the compiler would simply compute it at compile-time, but to avoid extremely long compilation times, the Futhark compiler does not evaluate loops (except in trivial cases).

And it gets worse. Futhark inlines a lot, so the definition of that constant gets duplicated:

let f (x: i32) =
  x * i32.sum (iota 1000000000) + i32.sum (iota 1000000000)

In simple cases, common subexpression elimination can often deduplicate this, but there is no guarantee that the expressions that compute constants cannot be very complex, with higher-order functions and internal bindings, that make them look very different. Worse, if the constants are used inside of a loop, then they might end up being recomputed for every iteration of the loop!

This happened to one Futhark programmer who implemented a library of certain cryptographic primitives and made heavy use of constants for representing things such as commonly used numbers in their bignum representation. This library took about six hours to compile, and didn’t run very fast either. After implementing proper support for constants in the compiler, now guaranteeing zero duplication of work or code, the compile time dropped to two minutes (still a lot), and run-time performance increased significantly as well.

There is nothing particularly fancy or clever about how the Futhark compiler now handles constants, so it’s not worth dwelling too much on them. I’m just amazed that it took this long before our naive implementation ran into serious problems.

Various improvements to generated library code

As an actually pure language that doesn’t even sneak side effects in though monads or whatnot, Futhark is not useful for writing applications. However, it is very useful for writing libraries that are then invoked by applications. This does of course mean that Futhark needs to generate well-behaved code that cannot assume too much of how it is going to be (ab)used. In particular, freeing a Futhark context now works properly (in particular, does not leak memory). Most programs that call Futhark don’t free the context prior to the program itself shutting down, but someone wrote a program that needed to frequently create and destroy new contexts.

The improvement to constants also helps library code significantly, as previously constants might get re-computed whenever a Futhark entry point is called. Now constants are computed only once, during context initialisation.

Colours and animation

The trend these days seems to be for the output of console programs to look like fruit salad, and Futhark was awfully old-fashioned with its monochromatic static text. Now the compiler prints the header for type errors in a nice red, and the benchmarking tool prints a progress bar.

Since type errors are the main user interface for the compiler, I hope we can improve them further in the future.