Custom tuning parameters - a dubious feature

Posted on February 3, 2026 by Troels Henriksen

Yesterday I wrote about the newest Futhark release, but while I highlighted the highly appreciated contributions of students and external contributors, I intentionally left out a somewhat dubious language feature of my own devise. That, however, is the topic of this post.

Tuning parameters

The code generated by the Futhark compiler makes various dynamic decisions on how to run the code most efficiently. The most fancy example is incremental flattening, which, for a given instance of nested parallelism in the source program, will generate multiple code versions that produce the same result, but exploit the parallelism differently. As a simple example, for an expression map f xs where f is a function that contain its own parallel constructs, perhaps it is best to launch a thread per iteration of the map and simply run f sequentially, or to exploit the parallelism in f as well. The underlying tension is that exploiting parallelism is a good idea if it allows you to fully exploit the computation resources available, but once that happens, further parallelism is simply overhead. At run-time, Futhark decides between different versions by comparing the amount of parallelism in the outer level (that is, the number of iterations in the map) with a threshold parameter that is some measure for “how much parallelism is needed to saturate the GPU”. The optimal value for this threshold parameter is highly dependent on both the application and hardware, and can only be found through autotuning (paper on the technique).

Thresholds are an example of what we call a tuning parameter - a knob you can turn that may have performance consequences, but do not affect the value you get out of running a program. Although thresholds are the most prominent, Futhark exposes various other tuning parameters, for controlling the GPU grid size, thread block size, number of CPU threads, chunk sizes, cache sizes, and so on. These all have reasonable defaults set by the runtime system, but can be manually tweaked if needed. Tuning parameters are set at program startup via command line options, or by passing in a tuning file.

But why should code generated by the compiler have all the fun? It cannot be expected that the Futhark compiler does everything for you, and people who implement algorithms in Futhark also often end up having parameters such as block sizes or sampling rates that have no semantic effect, but can influence the performance. You can of course expose these knobs as ordinary parameters, but that can easily lead to unwieldy APIs. Since Futhark already had a run-time mechanism for tuning parameters, I reasoned that it might be useful for Futhark programs to be able to define their own, with largely the same usage in mind - a hidden, implicit parameter that can be set at program startup, but otherwise has a reasonable default.

Our design for custom tuning parameters

Although conceptually simple, designing custom tuning parameters turned out to be a bit tricky. Each tuning parameter should have two main properties:

A name.
A default value of type i64.

The restriction to i64 is because that is what the existing run-time infrastructure supports, and I did not want to change it.

There is some interesting prior work here. Chapel, a PGAS language for HPC uses, supports a feature called configuration parameters, which are global variables that can be overridden using “implementation-dependent means”, which in practice means command-line options and configuration files. I generally consider Chapel to be a really good and inspiring design, which is carefully tailored to its intended HPC domain. Configuration parameters are a good example: while it is likely not flexible enough to be useful for full general-purpose programming, it makes it very easy to expose configuration knobs without having to worry about the details of parsing command line parameters, generating help text, and so on. I was very tempted to copy the design of Chapel’s configuration parameters entirely, which consists largely of putting config in front of a global variable definition. It may still turn out that it was a mistake not to copy Chapel’s design.

For Futhark, I wanted to avoid adding new syntax, perhaps I am unsure whether this design will persist. I also wanted the default value of a parameter to be able to depend dynamically on input data - which I honestly don’t know is a good idea or not.

My first idea for the design was to add a new intrinsic function such as param, which would be used in the form param("foo", 123). This would then return 123 unless the tuning parameter "foo" had been overridden by the user. Since param would have the type param : []u8 -> i64 -> i64, this enforces that the tuning parameter is a 64-bit integer. Unfortunately, that []u8 in the type is a bit of a problem - Futhark does not have strings, merely some syntactic sugar for byte arrays, and there is no way to require that the first argument to param is a string literal. What should occur for cases like param(e, 123) where e is some complicated expression? We don’t want to evaluate arbitrary code just to determine the name of the tuning parameter! And therefore we cannot use an intrinsic function for this.

Instead the design is based on attributes, through which expressions can have arbitrary metadata associated with them. This also seems appropriate: tuning parameters are supposed to be “non-semantic”, in the sense that their value is not supposed to affect the result of the program, in the same way that attributes can also be freely ignored from a semantic perspective. Tuning parameters are exposed via the attribute #[param(param_name)], which, when attached to an expression of type i64, defines a tuning parameter named param_name, with the default value taken from that expression. Example:

def block_size : i64 = #[param(BLOCKSIZE)] 32

Since the attribute can be attached to arbitrary expressions, we can generate the default value from properties of the input data. For example, maybe it is useful for the block size to be the square root of the length of some input array.

One downside of this design is that there is nothing that prevents you from putting the param attribute on an expression that is not of type i64. In principle the type checker could detect this, but for perhaps foolish ideological reasons, I am reluctant to make the type checker too aware of attributes. In practice, the compiler silently ignores attributes on expressions of an invalid type.

Another problem is that there is nothing that prevents different libraries from defining tuning parameters of the same name, which will then conflict. In practice, it means that setting one tuning parameter will also set the other, although they can have different default values. The root problem here is that tuning parameters inhabit a single global namespace, and it is not quite clear to me how that can be resolved. One wrinkle is that Futhark’s ML-style module system means that application of parameterised modules means that one definition of a tuning parameter in the program text may in fact be duplicated any number of times, and it’s not really clear which of these should have the name that “wins”.

One problem, which really isn’t, is that there is nothing that requires programmes to use tuning parameters only for semantically transparent values. You can use the value you get back for whatever you want. Indeed, our tests for this feature depend quite obviously on being able to observe their values. That’s just one of those sharp edges, I suppose.

Performance properties

The easiest way to implement tuning parameters is purely dynamically, such that when we compile an expression #[param(BLOCKSIZE)] e, we emit code that checks in some context structure whether BLOCKSIZE has been set and if so return that value, or otherwise execute and return the value of e. But tuning parameters are often used in cases where it would be very useful if the compiler could know their specific value when generating code, for example to optimise memory allocation or unroll loops. For Futhark’s GPU backends, this can actually be done easily enough: Futhark essentially does JIT compilation of GPU kernels at program startup, and since the tuning parameters are already set by then, we can just inline them directly before invoking the kernel compilers. This is a little more tricky for the host side of the program, since that is C code that is compiled just once.

One solution I have considered is the ability to pass values for tuning parameters to the Futhark compiler itself, where they will then be inlined whenever they are used. The downside of this approach is that it makes the “tuning” part of tuning parameters much more time-consuming, as you will need to completely recompile the program whenever you want to try a new value. Perhaps a mixture would be a good compromise: by default, tuning parameters are a purely run-time mechanism to enable swift iteration, but optionally they can be passed to the compiler and baked in more thoroughly. It may of course be that in some cases the performance advantages only occur when they are passed to the compiler, and so the tuning speed advantage of dynamic tuning parameters is pointless.

Takeaway

Despite not yet having used them for anything real, I remain convinced that custom tuning parameters are a useful feature in principle. I am however unsure that sneaking them in through attributes is the right way - it feels like I am afraid to commit, and I normally detest cowardice in language design. I think we also need to further improve the performance potential of the implementation, but I should probably implement one of those so-called “use cases” first…