Futhark 0.19.1 released

Posted on March 4, 2021 by Troels Henriksen

We made another one (full changelog here). This is a major release, which means that we broke compatibility. So what happened? Did we change type of a fundamental constructs or add a ton of novel type rules? Sort of: the negate function that we deprecated in 0.18.6 is now gone, and you must use neg instead, thus addressing another regret.

Since I don’t think that this ~~change~~ improvement by itself is going to be the killer feature that pushes Futhark into the mainstream, I’ll also mention some of the things we added since the last release announcement.

One improvement that actually landed in 0.18.4, but which I never got around to writing about at the time, is that our CUDA backend now uses a much more efficient implementation of scan, based on a clever single-pass technique. This is really an abuse of the GPU programming model, but NVIDIA promises that it’s fine - and they use the same technique in CUB. With this implementation, scan is almost as fast as map, and since scan is such a crucial building block for many parallel algorithms, this has significant impact by making e.g. radix sorting and filtering much faster. I only wish I knew of a similar technique for AMD GPUs. The work was done by Andreas Nicolaisen and Marco Aslak Persson for their master’s thesis at DIKU.

Another significant improvement is automatic register tiling of certain classes of nested parallel loops. Loop tiling is a family of locality optimisations, which can be quite important for memory-bound code. The Futhark compiler has supported a fairly general block tiling optimisation for years, but register tiling can improve performance on e.g. matrix multiplication by at least another factor of two. Tiling is very sensitive to choice of tile sizes, and the compiler is not yet very good at automatically picking good default sizes, so some manual tweaking of tuning options is needed to obtain the best performance. We hope to address that soon - it shouldn’t be very difficult. This improvement was also done by students, namely Anders Holst and Æmilie Bom Cholewa-Madsen, for their bachelor’s thesis.

The final improvement I will write about is perhaps somewhat obscure, but it touches on interesting aspects of language design. For a long time, Futhark has contained a facility for doing parallel scatter operations, exposed to the user as the function scatter of the following type:

val scatter 't [m] [n] : (dest: *[m]t) -> (is: [n]i64) -> (vs: [n]t) -> *[m]t

Given an array dest and two arrays is and vs of indexes and values respectively, scatter dest is vs returns dest but with the elements at the positions given by is replaced by the corresponding values in vs. This is a useful building block, but limited in that we can only update points in a one-dimensional array. If we wish to scatter into a two-dimensional array, we must first flatten it, manually compute the corresponding flat indexes, perform the scatter, and then unflatten it back to its original shape. This is clumsy. Worse: it confuses the compiler and inhibits optimisation.

So, why is scatter not polymorphic in the dest array? Because to the type checker, scatter is just another function, and the type system is not strong enough to express an property such as “the is array contains p-ary tuples, where p is the rank of dest”. All functions, even those that map directly to compiler intrinsics, must be expressible in the type system. As a workaround, our PhD student Philip Munksgaard added functions scatter_2d and scatter_3d that support 2D and 3D arrays, respectively. This is a pragmatic solution, but it is not beautiful (and what about 4D arrays?).

The great irony is that this problem does not exist in the core language we use inside the compiler. In the core language, scatter (and all other intrinsics) are not functions, but rather special language constructs with their own ad-hoc rules and semantics. We can get away with a lot more in the core language, because it does not have to support type inference or polymorphism. Can we do the same thing in the source language? Possibly! Even in the source language, language constructs that have their own syntax can also have their own ad-hoc rules for type checking and inference. Adding lots of special syntax to a language is not good design, but since scatter is part of the minimal basis, it may deserve it.

Futhark already has a notation for array updates: a with [i] = v, which returns a with the value at index i replaced by v. If we allow i to be an array, then we have a decent notation for scatter (we must also require that v is an array of the same size as i). It is straightforward to use the existing notation for multidimensional updates a with [i,j] = v to then support arbitrary-dimensional scattering.

Type inference such an extended construct is somewhat challenging, and I have yet to make up my mind on whether to require special syntax for the scatter case. If not, the expression a with [i] = v will, in the absence of any other information, give rise to the somewhat convoluted type checking constraint “i must have type i64 or [n]i64 for some n, and in that case v must also be an array of size n. I long ago decided quality of type errors was more important than minimising the number type annotations, but we’ll see whether we can make it work.

One unsolved problem is scatters cousin, reduce_by_index (also known as generalised histogram), which similarly suffers from being limited to a single dimension. However, I have not been able to come up with a good syntax that would allow us to promote it to the level of a language construct.