Quantifying Student Projects
Futhark’s main developers work as researchers and teachers at the
University of Copenhagen. One of the perks of
working at a university is a steady supply of free labour students
looking for projects. Each undergraduate student must at least write a
bachelor’s thesis, and each master’s student must at least write a
master’s thesis. Beyond these, students can also do elective projects
if they wish. When I was a student, I always found it very interesting
to participate in real research projects, and I would like current
generations of students to have that experience as well. Allowing
students to participate in Futhark development benefits all: the
students get to spend their time building stuff that (potentially)
actually matters, and Futhark improves in ways we wouldn’t have time
for otherwise. I wrote a little bit about how we make use of student
projects when I wrote my PhD
reflections,
and again
later,
but I thought maybe people would be interested in some numbers about
how much student work has gone into the Futhark compiler.
Anyone who has worked in academia has experience with software built though student-based development. Such software is maintained by generations of students, each of which contribute a few pieces that are integrated in whatever way is possible, without any major thought towards long-term maintainability or coherent design. The quality is usually poor, documentation often absent, and problems fixed only by throwing more students at it. This is a risk that I was acutely aware of when we began inviting students to work on the compiler, and we have been quite picky regarding what we ultimately merge.
There are of course students who simply do not manage to produce a contribution of the required quality. That is perfectly fine and expected - they are still likely to have passed their project, which is evaluated on their completion on learning goals, not how much we were able to exploit their labour.
Second, some exploratory projects may produce a contribution that does work, but which is simply too complex compared to what it offers. The best example is the work by Steffen Holst Larsen on Multi-GPU Execution and a Vulkan backend. Absolutely top notch work, but neither the compiler infrastructure nor the surrounding software environment (in the case of Vulkan) was ready at the time. Merging these contributions would have imposed a nontrivial maintenance burden on us, without truly benefiting users. These were both successful projects, in that we learned things we have made use of since, and Steffen went on to work on compilers at Codeplay.
Let’s talk numbers. First, some caveats. I only have numbers for the projects I have supervised or co-supervised myself. Cosmin Oancea, who founded the Futhark project, has supervised several compiler-related projects, which are not included here. Martin Elsman has also supervised many projects, but mostly about data parallel programming, and less about the compiler itself. I am also leaving out PhD students, as their contributions are on a very different scale.
But as for me, I have (co-)supervised a total of 52 projects: 15 MSc theses, 31 BSc theses, and 6 auxiliary projects. Of these, 15 were not directly related to the compiler or its tooling, but involved such things as implementing parallel algorithms or porting benchmarks. I will not be considering these further, but several of them resulted in work that we still use, or intend to use.
That leaves 37 projects that directly worked on the compiler or its tooling. Of these, 22 resulted in contributions that were integrated in the main compiler code base. I can list a few of the more noteworthy ones:
The CUDA backend by Jakob Stokholm Bertelsen.
A rewrite of the fusion engine by Walter Restelli-Nielsen and Amar Topalovic.
The Python/PyOpenCL backends by Daniel Gavin and Hjalte Abelskov.
A C# backend by Mikkel Storgaard (although later removed because we couldn’t maintain it).
Improvements to the autotuner, by various students including Frederik Thorøe, Simon Rotendahl, and Carl Mathias Graae Larsen.
Futhark Language Server by Haoran Sun.
Sum types by Robert Schenck.
Register tiling by Anders Holst and Æmilie Cholewa-Madsen.
WebAssembly backend by Philip Rajani Lassen.
The multicore backend by Duc Minh Tran.
The ISPC backend by Louis Marott Normann, Kristoffer August Kortbæk, William Pema Norbu Holmes Malling, and Oliver Bak Kjersgaard Petersen.
Scalar migration by Philip Jon Børgesen.
Reactive benchmarking by Aleksander Junge.
Locality optimisations by Oscar Nelin and Bjarke Lohmann Pedersen - actually not quite merged yet, but will be when I find the time to do all the fit and finish.
These range from significant new parts of the compiler (such as new backends), to rewrites of pre-existing older passes (the fusion engine, locality optimisations). The latter are actually the hardest to integrate, as we want to avoid performance regressions, and the current behaviour is often a mix of hacks and ad-hoc implementation quirks.
Futhark has definitely benefited from student work in the past, and it is certain that we will continue to do so in the future.