Bridging Futhark and SML

Posted on October 13, 2023

Futhark is not a general purpose language, and Futhark programs are typically used as libraries from other languages. This mainly happens through a C-based API, but of course some programmers do not like writing their application code in C. Fortunately, due to C’s popularity, most languages have ways of invoking code written in C. The raw C API exposed by a compiled Futhark program is not particularly convenient by modern standards, so it is usually a good idea to write a bit of glue code to wrap the C types in more ergonomic high level types. Construction of such glue code can be automated, and we call such a generator a bridge. Bridges already exist for languages such as Haskell, Python, Rust (three of them), and OCaml. All of these were written by people not directly involved with the Futhark compiler itself. In this post I will discuss the construction of a new bridge, written by myself, for the Standard ML (SML) language. It may be interesting to people who also need to worry about language interoperability.

Standard ML

SML is a functional language stretching back to the 70s, and was one of the main drivers behind language features that are now common, such as Hindley-Milner type inference. It was also arguably one of the first statically typed functional languages that were generally usable; with several production-grade compilers available since the early 90s. SML is also unusual for being perhaps the only industrial programming language with a formal definition.

Today, SML is nearly dead. The reasons why languages such as OCaml and Haskell managed to overtake SML in popularity are interesting, but a different topic. However, nearly dead is not dead, and not only are multiple high quality SML compilers still maintained (MLton, MLKit, Poly/ML, SML#, SML/NJ, and more), interesting applications written in SML are also still around, and cutting edge research on language implementation is still being conducted using SML compilers. Personally, SML was the first language I was taught at DIKU, and although I dismissed it at the time when compared to Common Lisp and Haskell, I eventually started to appreciate its simplicity. Finally, I work about 10 metres from the maintainers of two different SML implementations (MLKit and MosML). Despite its obscurity, SML is over-represented in my daily life and research.

Bridge construction

Futhark’s C API is designed to be simple to use not just from C, but also from languages that have fairly crude facilities for invoking C. We call the language calling Futhark-generated C code the bridge language. In particular, all types Futhark exposes are fixed size: either by being standard primitive types (e.g. int32_t), or by being pointers. Users of the API never have to allocate memory on the C heap or use sizeof or similar. For example, constructing a configuration object is done with a function with the type

struct futhark_context_config *futhark_context_config_new(void);

where struct futhark_context_config is an opaque struct. This means that you cannot use the common performance trick of allowing the caller to allocate the memory in advance, but in practice most Futhark functions do so much internal work that this would not be a meaningful performance advantage. Not all languages can easily interoperate with unpredictably sized C structs, but most of them can figure out how to pass pointers around.

Some functions do require the caller to allocate memory in advance, such as the function that copies data from a Futhark array (which may reside on the GPU) to some location in memory. We call this a values function, and for arrays of type []f64 it might look like this:

int futhark_values_f64_1d(struct futhark_context *ctx, struct futhark_f64_1d *arr, double *data);

The data argument must point to some place in memory with enough room for the entire array. The size of the array is obtained with a different API function and multiplied with the element size, which is statically predictable. There are two subtleties here:

While most languages allow you to create an empty array of some size, passing this array as a pointer to a C function is not necessarily straightforward. This is only safe if the language guarantees that the in-memory representation of the array is “unboxed”; meaning it corresponds to what C expects. If the language does not guarantee this, you need to allocate raw memory, ask Futhark to copy the array to that location, and then copy the elements into the bridge language array - which can be quite slow.
The values function is allowed to copy asynchronously, meaning the copy might still be physically ongoing when the function returns. This allows overlapping copies with other work, which can be important for performance. Unfortunately, many of the potential bridge languages use garbage collection, and most garbage collectors do not guarantee that objects do not move around in memory. If an array was moved whilst Futhark was copying data into it, the results would likely be disastrous. Some languages allow one to allocate non-movable “pinned” memory, but unless great care is taken, it is best to generate glue code that performs a full synchronisation after calling the values function.

Another minor thing, but which turns out to be a big convenience, is that the Futhark compiler will emit a manifest (a JSON file) that describes the entire generated API in a machine-readable way. This is useful, as which functions are available depends on which entry points are defined in the Futhark program, and which types they expose. Prior to adding the manifest, bridges had to analyse the generated C header file, which is awkward and error-prone.

Futhark’s C API reports errors via return codes, following the usual 0-means-success convention. This is notoriously error prone and tedious to handle in C, but at least it is very easy to wrap in whichever error handling facility is customary in the bridge language, such as exceptions.

Memory management is a more tricky business. Futhark does not allow the construction of circular data structures and thus internally uses reference counting, but it has no way of knowing when the user is done with some piece of data. Therefore the user is responsible for eventually freeing all data returned by Futhark, using a Futhark-provided function (that in practice just decrements a reference count). While C programmers are famously known for never making mistakes when doing manual memory management, programmers of more high level languages are more imperfect, and therefore it is advisable for a bridge to hook into the automatic memory management (reference counting, finalizers) of the bridge language.

Implmentation of `smlfut`

The Futhark/SML-bridge has been implemented as a program smlfut. You pass it a Futhark manifest file and it will spit out SML code as well as a bit of glue C code. I had hoped to avoid the need for generating C, but I needed to do a few (uninteresting) things that could not be expressed directly in SML.

SML has a very nice module system that allows one to write a signature that abstractly describes the interface implemented by a module. Smlfut makes good use of this. For example, the Futhark program

def inc (xs: []i32) = map (+2) xs

currently gives rise to this collection of SML signatures:

signature FUTHARK_POLY_ARRAY =
sig
  type array
  type ctx
  type shape
  type elem
  val new: ctx -> elem ArraySlice.slice -> shape -> array
  val free: array -> unit
  val shape: array -> shape
  val values: array -> elem Array.array
  val values_into: array -> elem ArraySlice.slice -> unit
end

signature FUTHARK_OPAQUE =
sig
  type t
  type ctx
  val free : t -> unit
end

signature FUTHARK_RECORD =
sig
  include FUTHARK_OPAQUE
  type record
  val values : t -> record
  val new : ctx -> record -> t
end

signature FUTHARK = sig
  val backend : string
  val version : string

  type ctx
  exception error of string
  type cfg = {logging:bool, debugging:bool, profiling:bool, cache:string option}

  structure Config : sig
    val default : cfg
    val logging : bool -> cfg -> cfg
    val debugging : bool -> cfg -> cfg
    val profiling : bool -> cfg -> cfg
    val cache : string option -> cfg -> cfg
  end

  structure Context : sig
    val new : cfg -> ctx
    val free : ctx -> unit
    val sync : ctx -> unit
  end

  (* []i32 *)
  structure Int32Array1 : FUTHARK_POLY_ARRAY
  where type ctx = ctx
    and type shape = int
    and type elem = Int32.int

  structure Opaque : sig
  end

  structure Entry : sig
    val inc : ctx -> Int32Array1.array -> Int32Array1.array
  end
end

The particularly interested can read the fine manual, but the most interesting thing is that Futhark arrays must be associated with some SML type, here the built-in polymorphic array type, such that Futhark arrays can be converted to SML-comprehensible data. More on that design decision in a bit.

It is also interesting (for people who find that sort of thing interesting) that identifiers from the Futhark program can influence identifiers in the SML program; in this case, the name of the entry point inc. What happens if a Futhark entry point has a name that is not a valid SML identifier? For smlfut, I have decided to emit an error. Users can always pick better public names, and this is better than coming up with complicated escaping rules. (This rule of course does not apply to names that are not visible in the API.)

Compiler support

Since SML has so many compilers in use, my initial plan was to support several of them. However, due to unexpected technical friction, only MLton (and its fork MPL) is currently supported. The main obstacle is that while the SML Basis Library defines an interface for monomorphic arrays, which I would expect implies an unboxed representation, only MLton actually seems to implement this interface for all necessary primitive types. For example, MLKit (generally my favourite SML compiler) only implements monomorphic arrays for the types Char8 (uint8_t) and Real (double). SML/NJ, which generally has good support for the Basis Library (they invented it), also has a rather anemic selection of monomorphic arrays. This is ironic because in MLton, polymorphic arrays also have an unboxed monomorphic representation, so I don’t actually need to use monomorphic array types in order to access their elements from C. Since smlfut already has a command line option for switching between monomorphic and polymorphic array interfaces, I am considering adding a switch that allows one to copy Futhark arrays into Char8 monomorphic arrays, and make it to to the user to interpret the raw bytes as proper numbers.

While MLton’s FFI is somewhat basic, the Futhark C API was designed to accomodate this, and I did not encounter any real obstacles. I did not get very far with the MLKit implementation, but this was due to array type challenges, not the FFI.

Resource management

While MLton does support finalizers, neither MLKit nor MPL does. As a result, smlfut requires manual resource management by the user. There is no nice way to put it: this is a major downside. I hope to come up with a solution somehow, but beyond simply using finalizers when available, I’m not sure what to do.

What does it feel like to `smlfut`?

It feels good, if somewhat verbose. This is a small chunk of SML code that makes use of the Futhark program above:

val ctx =
  Futhark.Context.new Futhark.Config.default
val arr_in =
  Futhark.Int32Array1.new ctx (ArraySlice.full (Array.fromList [1, 2, 3])) 3
val arr_out =
  Futhark.Entry.inc ctx arr_in
val arr_sml =
  Futhark.Int32Array1.values arr_out
val () =
  Futhark.Int32Array1.free arr_in
val () =
  Futhark.Int32Array1.free arr_out
val () =
  Futhark.Context.free ctx

And in particular, it has already proven useful for some work I’ve been invited to participate in. More on that in the future, hopefully.