Why are sizes signed?
In Futhark, array sizes have type i64
- the type of
signed 64-bit integers. This is observable whenever we use a size
parameter as a term-level variable:
def length [n] 't (xs: [n]t) : i64 = n
Since Futhark arrays cannot have a negative size, one might reasonably
ask why sizes are not of type u64
- the type of unsigned 64-bit
integers - which specifically cannot represent negative numbers. This
question often comes from functional programmers, who are used to
making invalid states unrepresentable through types. Perhaps they
have even encountered size-indexed vectors such as Data.Vect in
Idris,
where the type system is used to guarantee the absence of run-time
errors (having this in Futhark would have saved me a blog
post). Were I a Zen master, I
would perhaps respond to such well-meaning questions with
violence,
but while hacker koans are certainly a great inspiration as to the
usefulness of beatings in
teaching,
such approaches do not scale beyond the classroom, so I figured I
should write a blog post justifying this part of Futhark’s design.
First, let us consider what we would gain with unsigned sizes. One is
that iota
would be a total function. Its type is currently this:
val iota : (n: i64) -> [n]i64
Invisible in the type is an implicit requirement that n
is
non-negative. If this is violated, the result is a run-time error.
If the parameter type was instead u64
, such an invalid argument
would not be possible. Of course, in practice there are still many
values of n
for which iota
will not be able to actually construct
an array with n
elements in computer memory, so totality is
in practice still a bit of an illusion.
Another advantage is that we will know, inside functions, that variables corresponding to sizes are non-negative. Well, we know anyway, but now we would know with types! In Idris this is useful because Idris does not actually use “unsigned integers” in the machine sense of the word, but rather inductively defined natural numbers, that you can perform recursion over. The types then act as proof witnesses that help you rule out impossible cases (such as negative sizes) that you would otherwise have no way of handling. In particular, it is useful because Idris vectors have exactly the same inductive structure as the natural numbers - i.e. they are linked lists. In Futhark, arrays are not recursive (and this is crucial for parallelism), so this advantage would be lost.
Now let’s look at the downsides of using u64
for sizes. The most
fundamental problem is that sizes and indexes really ought to have the
same type. Partially this is so we can say things like “an index i
for an array of size n
is in-bounds if 0 <= i < n
”. This becomes
unclear if i
and n
are not the same type (or worse, misleading if
we allow some kind of implicit conversion).
Another reason is that whenever we use explicit indexing instead of
just map
, it is invariably because we want to perform index
arithmetic, and unsigned integers are just not very good at
arithmetic. For example, the subtraction x-y
might not be
representable as an unsigned integer, even if x
and y
are quite
small integers. Overflow can of course also happen for signed
numbers, but it tends only to happen for very large numbers (which are
rare), while unsigned overflow can happen for numbers close to zero
(which are common).
Another annoyance is that unsigned numbers have no value that is
guaranteed to never be an invalid index, such as -1
. These sentinel
values are useful for constructs such as
scatter
, where they represent
results to be discarded. One can of course argue that it would be
even better to use a sum type with a
distinguished constructor to avoid silent errors where mistaken
out-of-bounds indexes are ignored.
Finally, slicing is problematic. In Futhark, as in Python, the way to
reverse an n
-element array x
is
-1:-1:-1] x[n
or with implicit bounds as
-1] x[::
Since Futhark uses inclusive start and exclusive end indexes, as
Dijkstra wills
it,
we must provide -1
as the end index when computing a reverse slice -
and of course, the stride itself is also negative. These values are
not representable with unsigned numbers.
While we eventually want to do more advanced verification of Futhark
programs based on sizes and indexes, this will inevitably require
specialised machinery capable of understanding integer ranges
directly, rather than encoding it in the structure of recursive types.
When such machinery is available, using u64
for sizes has no real
advantage compared to using i64
along a guarantee that the actual
size is never negative. In fact, sticking to only i64
is probably
helpful, as such analyses tend not to be great at handling type
conversions.