Explicit quantification of existential sizes

Posted on October 16, 2021 by Troels Henriksen

When we first designed Futhark, array sizes were not part of the type system. A function such as map had this type:

val map 'a 'b : (a -> b) -> []a -> []b

(In the very earliest days, map was syntax and not a function, but let’s skip past those truly primordial times.)

As array programming often requires maintaining various size invariants, we eventually added support for size types, which lets functions express how the size of their result depends on the size of their parameters:

val map 'a 'b [n] : (a -> b) -> [n]a -> [n]b

val transpose 'a [n] [m] : [n][m]a -> [m][n]a

This is useful, but it does raise the question of how to express functions that return arrays whose sizes cannot be straightforwardly known from the sizes of the arguments. An example of such a function is filter:

val filter 'a [n] : (a -> bool) -> [n]a -> []a

We don’t know what to put inside the brackets in the return type, since we have no way of knowing how many elements will satisfy the property. Our solution is to leave the size blank as above, which we call an anonymous size. Whenever filter is fully applied, as filter p xs, the type checker will invent a new size, say k, and assign the result the type [k]a. We have no idea what this k will be, but we know that it must eventually exist at runtime. Inspired by terms from predicate logic, filter has an existential size:

(α → bool) → [n]α → ∃k.[k]α

(Note the use of Unicode to indicate that this is not a Futhark type.) Under this interpretation, there exists some k that is the size of the array that filter returns. In a dependently typed language, filter would return a dependent pair. For example, this is the type of the Data.Vect.filter function in the dependently typed language Idris:

filter : (elem -> Bool) -> Vect len elem -> (p : Nat ** Vect p elem)

The return value is a pair of a number p and some Vect with p elements. So why did we not implement existential sizes in Futhark in terms of dependent pairs? The main reason is to keep the language simple. Consider an expression length (filter p xs). The length function expects an array, not a dependent pair. If filter returned such a pair, the programmer would need to unpack it in some manner, which is awkward. Dependent pairs are a powerful programming technique, but for Futhark’s domain, we don’t need that power. Simplicity is more important.

That said, I am a PL academic, so of course Futhark’s existential sizes have an explanation in terms of dependent types. Specifically, the expression

length (filter p xs)

is really just syntactic sugar for first binding the result of filter, extracting the size and the array, and then applying length:

let [k] (ys: [k]a) = filter p xs
in length ys

In fact, the type checker will reject any program where this transformation is not possible. (This can happen for programs that abuse type inference sufficiently, but the details are well outside the scope of this post, and it probably will never occur in code written by a sane person.)

For a while, this was the scheme: programmers would put in anonymous sizes, which the type checker would interpret as existential quantifiers, and compilation would happen using a strategy inspired by dependent products. In some cases this proved too restrictive. For example, since each anonymous size would be replaced by a distinct existentially quantified size, there was no way to specify a function that returns a matrix of unknown size, but which is known to be square:

val square : f32 -> [][]f32

The above is interpreted as f32 → ∃nm.[n][m]f32.

We are also unable to express fancy higher-order types with complex constraints, such as a function that returns both an array of existential size, and a function that accepts arrays of precisely that size:

val f : f32 -> ([]f32, []f32 -> bool)

Contrived? Maybe, maybe not. Later we’ll see a function that actually needs this. The root problem is that the existential quantifier is not directly accessible to the programmer, but inferred by the type checker based on inflexible rules.

On a philosophical note, while I do (obviously) believe that type inference is good, I think it is crucial that programmers can always annotate their programs with the full and explicit types. This is not because I think fully elaborated programs are easy to read, but because it means that type inference can always be explained in terms of the language itself, rather than by reduction to some hidden “core” language or logic. Futhark’s anonymous sizes violated this principle, as they elaborated to some more expressive representation private to the type checker. Clearly we had erred.

So, as of recently, Futhark now allows explicit existential quantification of sizes in function return types. Because I prefer ASCII syntax, the notation is with ? instead of ∃, and the quantified names must be enclosed in brackets:

val filter [n] 'a : (a -> bool) -> [n]a -> ?[k].[k]a

Anonymous sizes are still allowed, and can now be explained as syntactic sugar for inserting an existential quantifier at the “nearest” enclosing function arrow.

Now we can give a type for the square function:

val square : f32 -> ?[k].[k][k]f32

And we can express complex relationships:

val f : f32 -> ?[k]([k]f32, [k]f32 -> bool)

Explicit programming with existential quantifiers will likely be rare. My own motivation for supporting this was mostly the violation of my principles for type inference. I am however looking forward to seeing which interesting new things we can express that we could not before. I have already written one somewhat interesting usage of explicit existential quantification.

One somewhat interesting usage of explicit existential quantification

While Futhark is most obviously useful for number crunching with matrices, part of the ambition is to be flexible enough to also support more general data-parallel programming. Let’s consider writing a Futhark function for splitting a string into words - in parallel!

First, let us consider what the type the function should be. In Haskell, we might specify a function that takes a String and returns a list of Strings:

words :: String -> [String]

Unfortunately, Futhark does not support irregular arrays - each element must have the same size. So in Futhark, the straightforward solution is to return an array of (index,length) pairs for representing each word. Also, Futhark does not have a character type, so our strings will actually be byte strings:

type char = u8
val words [n] : [n]char -> ?[k].[k](i64,i64)

This is fine, but… doesn’t seem quite type-safe, does it? There is no connection between those pairs and the string, and we have to manually slice the string to actually extract the words. We could accidentally slice the wrong string, perhaps resulting in out-of-bounds accesses! So let us add a bit more type safety:

type word [n] = ([n](), i64, i64)
val words [n] : [n]char -> ?[k].[k](word [n])
val get [n] : [n]char -> word [n] -> ?[m].[m]char

The parametric type word [n] represents a word in a size-n string. In order to “remember” the size n, we are forced to carry around an array of size n of unit values. At runtime, such an array will take no space beyond its size, but it’s admittedly a bit clumsy. The get function actually extracts a word [n] from a string of size n. As long as we keep the definition of the word type abstract (which can be done with the module system), this lets us avoid out-of-bounds accesses.

But we can still make mistakes! These types do not prevent us from extracting “words” from a different string of the same size - and that string might have word breaks in different positions! To avoid this, we can make words not just return an array, but also the get function:

type word [p] -- abstract, but similar definition to above
val words [n] : [n]char -> ?[p].(word [p] -> ?[m].[m]char,
                                 ?[k].[k](word [p]))

We are returning an existential phantom size p that does not correspond to the size of anything in particular, but forces us to use the returned function word [p] -> ?[m].[m]char to do anything with the array of word [p] values. That function will in practice be a closure that uses the original string.

Finally we have achieved type safety! It is somewhat intricate, but not particularly subtle. The existential quantification is easy to follow.

Oh, and while it feels a bit gauche after all this talk of types, the actual definition of the function can be seen here.