We made another one (full changelog here). This is a major release, which means that we broke compatibility. So what happened? Did we change type of a fundamental constructs or add a ton of novel type rules? Sort of: the
negate function that we deprecated in 0.18.6 is now gone, and you must use
neg instead, thus addressing another regret.
Since I don’t think that this
change improvement by itself is going to be the killer feature that pushes Futhark into the mainstream, I’ll also mention some of the things we added since the last release announcement.
One improvement that actually landed in 0.18.4, but which I never got around to writing about at the time, is that our CUDA backend now uses a much more efficient implementation of
scan, based on a clever single-pass technique. This is really an abuse of the GPU programming model, but NVIDIA promises that it’s fine - and they use the same technique in CUB. With this implementation,
scan is almost as fast as
map, and since
scan is such a crucial building block for many parallel algorithms, this has significant impact by making e.g. radix sorting and filtering much faster. I only wish I knew of a similar technique for AMD GPUs. The work was done by Andreas Nicolaisen and Marco Aslak Persson for their master’s thesis at DIKU.
Another significant improvement is automatic register tiling of certain classes of nested parallel loops. Loop tiling is a family of locality optimisations, which can be quite important for memory-bound code. The Futhark compiler has supported a fairly general block tiling optimisation for years, but register tiling can improve performance on e.g. matrix multiplication by at least another factor of two. Tiling is very sensitive to choice of tile sizes, and the compiler is not yet very good at automatically picking good default sizes, so some manual tweaking of tuning options is needed to obtain the best performance. We hope to address that soon - it shouldn’t be very difficult. This improvement was also done by students, namely Anders Holst and Æmilie Bom Cholewa-Madsen, for their bachelor’s thesis.
The final improvement I will write about is perhaps somewhat obscure, but it touches on interesting aspects of language design. For a long time, Futhark has contained a facility for doing parallel scatter operations, exposed to the user as the function
scatter of the following type:
val scatter 't [m] [n] : (dest: *[m]t) -> (is: [n]i64) -> (vs: [n]t) -> *[m]t
Given an array
dest and two arrays
vs of indexes and values respectively,
scatter dest is vs returns
dest but with the elements at the positions given by
is replaced by the corresponding values in
vs. This is a useful building block, but limited in that we can only update points in a one-dimensional array. If we wish to scatter into a two-dimensional array, we must first
flatten it, manually compute the corresponding flat indexes, perform the
scatter, and then
unflatten it back to its original shape. This is clumsy. Worse: it confuses the compiler and inhibits optimisation.
So, why is
scatter not polymorphic in the
dest array? Because to the type checker,
scatter is just another function, and the type system is not strong enough to express an property such as “the
is array contains
p-ary tuples, where
p is the rank of
dest”. All functions, even those that map directly to compiler intrinsics, must be expressible in the type system. As a workaround, our PhD student Philip Munksgaard added functions
scatter_3d that support 2D and 3D arrays, respectively. This is a pragmatic solution, but it is not beautiful (and what about 4D arrays?).
The great irony is that this problem does not exist in the core language we use inside the compiler. In the core language,
scatter (and all other intrinsics) are not functions, but rather special language constructs with their own ad-hoc rules and semantics. We can get away with a lot more in the core language, because it does not have to support type inference or polymorphism. Can we do the same thing in the source language? Possibly! Even in the source language, language constructs that have their own syntax can also have their own ad-hoc rules for type checking and inference. Adding lots of special syntax to a language is not good design, but since
scatter is part of the minimal basis, it may deserve it.
Futhark already has a notation for array updates:
a with [i] = v, which returns
a with the value at index
i replaced by
v. If we allow
i to be an array, then we have a decent notation for
scatter (we must also require that
v is an array of the same size as
i). It is straightforward to use the existing notation for multidimensional updates
a with [i,j] = v to then support arbitrary-dimensional scattering.
Type inference such an extended construct is somewhat challenging, and I have yet to make up my mind on whether to require special syntax for the
scatter case. If not, the expression
a with [i] = v will, in the absence of any other information, give rise to the somewhat convoluted type checking constraint "
i must have type
[n]i64 for some
n, and in that case
v must also be an array of size
n. I long ago decided quality of type errors was more important than minimising the number type annotations, but we’ll see whether we can make it work.
One unsolved problem is
reduce_by_index (also known as generalised histogram), which similarly suffers from being limited to a single dimension. However, I have not been able to come up with a good syntax that would allow us to promote it to the level of a language construct.