Dot Notation for Records

Posted on November 11, 2017 by Troels Henriksen

When we designed the Futhark record system, we went with a notation copied from Standard ML. Specifically, given a record

let foo = { x = 2, y = 3 }

we would access field x by writing #x foo. This differs from the vast majority of languages (including most others in the ML family), which use the ubiquitous dot notation: foo.x. The main reason for adopting Standard ML’s notation was ambiguity, since dot notation was used for the module system. Given an expression foo.x, is this a reference to field x of record foo, or to definition x in the module foo?

OCaml solves this ambiguity by requiring module names and variable names to be lexically distinct, specifically by requiring module names to start with a capital letter. However, adding naming constraints is a bit contrary to our language design philosophy, which states that familiarity and simplicity are key. However, given the ubiquity of dot notation in other languages, our choice to adapt Standard ML’s notation is also in conflict with our philosophy.

The solution, which I believe is also used by F#, is to unify the value and module name spaces. The meaning of foo.x is then always clear, as it can be resolved by the type of the foo in scope. This does not mean that Futhark’s grammar becomes context-dependent, since grammatically a field- and a module access can be treated equivalently.

Of course, this would not be a blog post if not for an additional wrinkle. Futhark supports a range of primitive types, such as i32 (signed 32-bit integers), u32 (unsigned 32-bit integers), f32 (single-precision floats), and so on. For each of these types, the Futhark basis library defines a module that contains various utility functions. For example, there is a module f32 that contains such functions as f32.sqrt, f32.min, and f32.cos. The problem occurs because Futhark’s approach to converting/casting between primitive values was to define a collection of special overloaded functions, one for every type, that could convert an argument of an arbitrary primitive type to the desired type. For example, there was a function f32 that could be called with any primitive value (say, an integer), and would return a corresponding single-precision float. With the unification of value and module name spaces, only one of these would be in scope simultaneously. This was a problem, as both the mathematical modules and the conversion functions were used quite liberally in existing Futhark code.

Since either the conversion functions or the modules would have to be renamed, likely with a longer name, we investigated exactly how frequently they were used, based on the premise that the most frequently used constructs should be given the shortest names. We looked at the Futhark benchmark suite, which constitutes the largest single collection of Futhark code (currently 8000 non-comment and non-blank lines). The analysis showed that 205 lines contained calls to module functions, while 158 lines performed type conversions. But interestingly, the vast majority of the type conversions were either from i32 to f32 (83 instances), or the other way (33 instances). Most of the remaining type conversions involved bit fiddling in a cryptography benchmark.

This inspired a new approach: ditch the overloaded conversion functions entirely, and replace them with monomorphic versions in the per-type modules, as well as succinctly named top-level functions for the most common conversions. Thus, to convert from, say, u8 to i16, you would now call the function u8 in the i16 module: u16.u8. The Futhark basis library prelude (which is implicitly imported by every Futhark program) was augmented with the following function definitions:

-- | Create single-precision float from integer.
let r32 (x: i32): f32 = f32.i32 x
-- | Create integer from single-precision float.
let t32 (x: f32): i32 = i32.f32 x

-- | Create double-precision float from integer.
let r64 (x: i32): f64 = f64.i32 x
-- | Create integer from double-precision float.
let t64 (x: f64): i32 = i32.f64 x

The mnemonics behind the names is that “t” is for “truncate”, and “r” is for “real”, as in “real numbers”. The versions for double-precision floats were mostly added for completeness.

The end result of this whole story is:

Futhark records now use conventional dot notation. Great win!

The majority of type conversions are just as concise as before, although some now require a module qualification.

Some overloaded functions were removed. Since overloading is implemented with compiler magic, and user-defined functions cannot be overloaded, it’s always nice to have fewer of them.