This release was not supposed to be a significant one. I only intended for it to remove a few obsolete constructs and fix some bugs. However, close to the release, a handful of nifty features managed to sneak their way in. Interestingly, most of these features were motivated by actual users who encountered various difficulties or made suggestions for improvement. Some of these users were students doing projects at DIKU, where we conduct human trials with Futhark on more or less willing subjects. However, the main source of motivation was Pepijn de Vos who implemented the discrete cosine transform, and also found time to implement a fast automatic Futhark-Python FFI after I remarked that I was not sure it would be possible. This may well be the first release that has been so heavily user-influenced.
One improvement is to the deallocation of memory. While Futhark’s GPU memory manager addressed most of our allocation performance woes, it did not to anything to address peak memory consumption. Specifically, memory management was still centred around basic blocks We can think of basic blocks as sequences of statements executed in sequence. Only at the end of a basic block would allocations go out of scope and thus be deallocated. For example, consider the following sequence of statements, of which some allocate memory, some use it, and some free it:
1: m1 <- alloc() 2: m2 <- alloc() 3: use(m1, m2) 4: m3 <- alloc() 5: use(m2, m3) 6: m4 <- alloc() 7: use(m3, m4) 8: free(m1) 9: free(m2) 10: free(m3) 11: free(m4)
This is a simplified example of a pattern that frequent pattern in Futhark-generated code: just before some bulk parallel operation (like a GPU kernel), memory is allocated for the output. This memory is then subsequently used as input to some subsequent operation. At the end of the basic block, all allocations performed during its execution are freed (possibly except those used to store the final result, which we elide for simplicity).
It is clear that in the above example, some of the allocations are
held for longer than strictly necessary. For example,
m2 is last
used in line 5, but is not deallocated until line 9. This means we
have more memory allocated at once than strictly necessary. In the
worse case, the device we are running on may not have enough memory to
run the program to completion! This was the problem encountered by
Pepijn de Vos. The solution was a rather simple patch
that uses a crude liveness analysis to insert
deallocations after the last statement in which an allocation is used:
1: m1 <- alloc() 2: m2 <- alloc() 3: use(m1, m2) 4: free(m1) 5: m3 <- alloc() 6: use(m2, m3) 7: free(m2) 8: m4 <- alloc() 9: use(m3, m4) 10: free(m3) 11: free(m4)
Now, at any point in time, only two memory blocks are allocated. On one version of the discrete cosine transform, this resulted in a five-fold reduction in peak memory footprint. Not bad for such a simple patch.
Another important improvement is to
futharki, the Futhark REPL.
This, too, was motivated by a user (Pepijn again) asking why basic
things did not work, and thus unwittingly shaming me into fixing it.
futharki now supports entering declarations, not
just expressions, which makes experimenting with modularised libraries
much nicer. Unfortunately, the largest problem with
specifically that the interpreter operates on the core language, not
the source language. Since the core language uses a different
(flattened/unzipped) value representation, this makes expressions that
return tuples (or arrays of tuples) produce unexpected results.
Contributions are very welcome!
However, as is the eventual fate of all language designers, I am most
pleased with what this release removes from the Futhark language.
zip are no longer language
constructs, but instead available through library functions. These
were the last remaining function-like language constructs. From now
on, anything that looks like a function in Futhark is a function,
and can be partially applied or passed to a higher-order function.
Happily, this marks the removal of the last legacy construct in Futhark. There is now nothing left that I wish to remove. While it is certain that we will still break compatibility from time to time, it should be much less frequent in the past, and only occur for exotic cases.