Futhark 0.19.1 released
We made another one (full changelog
here). This
is a major release, which means that we broke compatibility. So what
happened? Did we change type of a fundamental
constructs or add a ton of
novel type
rules?
Sort of: the negate
function that we deprecated in 0.18.6 is now
gone, and you must use neg
instead, thus addressing another
regret.
Since I don’t think that this change improvement by itself is
going to be the killer feature that pushes Futhark into the
mainstream, I’ll also mention some of the things we added since the
last release announcement.
One improvement that actually landed in 0.18.4, but which I never got
around to writing about at the time, is that our CUDA backend now uses
a much more efficient implementation of scan
, based on a clever
single-pass
technique.
This is really an abuse of the GPU programming model, but NVIDIA
promises that it’s fine - and they use the same technique in
CUB. With this implementation,
scan
is almost as fast as map
, and since scan
is such a crucial
building block for many parallel
algorithms, this has
significant impact by making e.g. radix sorting and filtering much
faster. I only wish I knew of a similar technique for AMD GPUs. The
work was done by Andreas Nicolaisen and Marco Aslak Persson for their
master’s thesis at
DIKU.
Another significant improvement is automatic register tiling of certain classes of nested parallel loops. Loop tiling is a family of locality optimisations, which can be quite important for memory-bound code. The Futhark compiler has supported a fairly general block tiling optimisation for years, but register tiling can improve performance on e.g. matrix multiplication by at least another factor of two. Tiling is very sensitive to choice of tile sizes, and the compiler is not yet very good at automatically picking good default sizes, so some manual tweaking of tuning options is needed to obtain the best performance. We hope to address that soon - it shouldn’t be very difficult. This improvement was also done by students, namely Anders Holst and Æmilie Bom Cholewa-Madsen, for their bachelor’s thesis.
The final improvement I will write about is perhaps somewhat obscure,
but it touches on interesting aspects of language design. For a long
time, Futhark has contained a facility for doing parallel scatter
operations, exposed to the user as the function scatter
of the
following type:
val scatter 't [m] [n] : (dest: *[m]t) -> (is: [n]i64) -> (vs: [n]t) -> *[m]t
Given an array dest
and two arrays is
and vs
of indexes and
values respectively, scatter dest is vs
returns dest
but with the
elements at the positions given by is
replaced by the corresponding
values in vs
. This is a useful building block, but limited in that
we can only update points in a one-dimensional array. If we wish to
scatter into a two-dimensional array, we must first flatten
it,
manually compute the corresponding flat indexes, perform the
scatter
, and then unflatten
it back to its original shape. This
is clumsy. Worse: it confuses the compiler and inhibits optimisation.
So, why is scatter
not polymorphic in the dest
array? Because to
the type checker, scatter
is just another function, and the type
system is not strong enough to express an property such as “the is
array contains p
-ary tuples, where p
is the rank of dest
”. All
functions, even those that map directly to compiler intrinsics, must
be expressible in the type system. As a workaround, our PhD student
Philip Munksgaard added functions
scatter_2d
and scatter_3d
that support 2D and 3D arrays,
respectively. This is a pragmatic solution, but it is not beautiful
(and what about 4D arrays?).
The great irony is that this problem does not exist in the core
language we use inside the compiler. In the core language, scatter
(and all other intrinsics) are not functions, but rather special
language constructs with their own ad-hoc rules and semantics. We can
get away with a lot more in the core language, because it does not
have to support type inference or polymorphism. Can we do the same
thing in the source language? Possibly! Even in the source language,
language constructs that have their own syntax can also have their own
ad-hoc rules for type checking and inference. Adding lots of special
syntax to a language is not good design, but since scatter
is part
of the minimal
basis, it may
deserve it.
Futhark already has a notation for array updates: a with [i] = v
,
which returns a
with the value at index i
replaced by v
. If we
allow i
to be an array, then we have a decent notation for scatter
(we must also require that v
is an array of the same size as i
).
It is straightforward to use the existing notation for
multidimensional updates a with [i,j] = v
to then support
arbitrary-dimensional scattering.
Type inference such an extended construct is somewhat challenging, and
I have yet to make up my mind on whether to require special syntax for
the scatter
case. If not, the expression a with [i] = v
will, in
the absence of any other information, give rise to the somewhat
convoluted type checking constraint “i
must have type i64
or
[n]i64
for some n
, and in that case v
must also be an array of
size n
. I long ago decided quality of type errors was more
important than minimising the number type annotations, but we’ll see
whether we can make it work.
One unsolved problem is scatter
s cousin, reduce_by_index
(also
known as generalised
histogram),
which similarly suffers from being limited to a single dimension.
However, I have not been able to come up with a good syntax that would
allow us to promote it to the level of a language construct.