When is it OK to modify the numerical behaviour of loops?

Posted on September 5, 2024

The Futhark compiler is able to optimise certain simple loops to closed forms. For example, a loop of the form

loop p = init for i < n do
  p && x

where x is invariant to the loop, will be optimised to init && x. This also works for a handful of other types and operators. In particular, a summation loop such as

loop p = init for i < n do
  p + x

turns into init + n * x. This is not done by advanced dataflow analysis, but a rather simple simplification rule that is really just fancy pattern matching. These optimisations were initially created long ago for work on program slicing, where you remove parts of a program result, and then aggressively perform simplification and code removal in order to end up with a residual program that has some desirable property. In our case, we wanted to produce residual programs that could give us information about the sizes of things (this was long before size types).

Such sliced programs don’t look much like programs written by humans, and a lot of the simplification rules you use to shrink them will rarely apply to real programs. We’ve kept the simplification rules intact in the simplifier all along however, which does mean Futhark has unusually powerful facilities for dead code removal. It also means we have the above-mentioned rules for detecting closed forms of loops. They haven’t been a problem, but recently a fellow researcher wrote a program that behaved oddly. Simplifying slightly, the program was this:

def f n = f32.sum (replicate n (1/f32.i64 n))

This program computes the sum of n floating-point numbers that are all of the form 1/n. Mathematically, this gives 1. With floating-point arithmetic, roundoff error means it is unlikely to produce 1 exactly. Using the closed-form loop optimisation above, the program will be simplified to 1/n*n, which will usually be 1.

At first glance, this may look fine. After all, the optimisation actually makes the program result closer to the idealised mathematical result. However, the challenge is that the result now depends on whether a specific optimisation happens to fire. This is often considered bad practice, because most optimisers are black-box, and it can be difficult and frustrating for programmers to figure out what is going on. Also, Futhark’s interpreter, which we use as the reference semantics of the language, does not perform any optimisations, and so will produce a different result from the compiler. This seems unfortunate. It seems clear to me that this is not an optimisation we should do, and so we disabled closed-form simplifications for the case of floating-point arithmetic.

However, there is a twist: The example with f32.sum is not a sequential loop (where the order of evaluation is completely specified and we ought not to change it), but rather a parallel loop - a reduction. We only guarantee deterministic reductions when the operator is associative, which floating-point addition is not. Does this give us leeway to optimise the program in ways that affect the result? Sadly, I would argue that it does not. Reductions conceptually work by inserting a provided operator (such as +) in between the elements of the array. The reason we require associativity is because they do not guarantee an order of application, i.e., how the parentheses are placed. But it is still the case that the result of the reduction will correspond to some possible order of application - the result will not be plucked out of thin air entirely. In general, I don’t think it is the case that there is any summation order for the program above that completely eliminates the roundoff error for any n.

Although the fix to this issue involved disabling an optimisation, I don’t think there is any impact on real programs. It did once again remind me of the uncomfortable truth that we don’t have any experienced numerical programmers in the Futhark team - our background is in things like type theory, programming languages, and high-performance computing - so for subtle issues like this, it is difficult for us to know whether we are erring too much on the side of semantic predictability.