Bridging Futhark and SML
Futhark is not a general purpose language, and Futhark programs are typically used as libraries from other languages. This mainly happens through a C-based API, but of course some programmers do not like writing their application code in C. Fortunately, due to C’s popularity, most languages have ways of invoking code written in C. The raw C API exposed by a compiled Futhark program is not particularly convenient by modern standards, so it is usually a good idea to write a bit of glue code to wrap the C types in more ergonomic high level types. Construction of such glue code can be automated, and we call such a generator a bridge. Bridges already exist for languages such as Haskell, Python, Rust (three of them), and OCaml. All of these were written by people not directly involved with the Futhark compiler itself. In this post I will discuss the construction of a new bridge, written by myself, for the Standard ML (SML) language. It may be interesting to people who also need to worry about language interoperability.
Standard ML
SML is a functional language stretching back to the 70s, and was one of the main drivers behind language features that are now common, such as Hindley-Milner type inference. It was also arguably one of the first statically typed functional languages that were generally usable; with several production-grade compilers available since the early 90s. SML is also unusual for being perhaps the only industrial programming language with a formal definition.
Today, SML is nearly dead. The reasons why languages such as OCaml and Haskell managed to overtake SML in popularity are interesting, but a different topic. However, nearly dead is not dead, and not only are multiple high quality SML compilers still maintained (MLton, MLKit, Poly/ML, SML#, SML/NJ, and more), interesting applications written in SML are also still around, and cutting edge research on language implementation is still being conducted using SML compilers. Personally, SML was the first language I was taught at DIKU, and although I dismissed it at the time when compared to Common Lisp and Haskell, I eventually started to appreciate its simplicity. Finally, I work about 10 metres from the maintainers of two different SML implementations (MLKit and MosML). Despite its obscurity, SML is over-represented in my daily life and research.
Bridge construction
Futhark’s C API is designed to be simple to use not just from C, but
also from languages that have fairly crude facilities for invoking C.
We call the language calling Futhark-generated C code the bridge
language. In particular, all types Futhark exposes are fixed size:
either by being standard primitive types (e.g. int32_t
), or by being
pointers. Users of the API never have to allocate memory on the C
heap or use sizeof
or similar. For example, constructing a
configuration object is done with a function with the type
struct futhark_context_config *futhark_context_config_new(void);
where struct futhark_context_config
is an opaque struct. This means
that you cannot use the common performance trick of allowing the
caller to allocate the memory in advance, but in practice most Futhark
functions do so much internal work that this would not be a meaningful
performance advantage. Not all languages can easily interoperate with
unpredictably sized C structs, but most of them can figure out how to
pass pointers around.
Some functions do require the caller to allocate memory in advance,
such as the function that copies data from a Futhark array (which may
reside on the GPU) to some location in memory. We call this a values
function, and for arrays of type []f64
it might look like this:
int futhark_values_f64_1d(struct futhark_context *ctx, struct futhark_f64_1d *arr, double *data);
The data
argument must point to some place in memory with enough
room for the entire array. The size of the array is obtained with a
different API function and multiplied with the element size, which is
statically predictable. There are two subtleties here:
While most languages allow you to create an empty array of some size, passing this array as a pointer to a C function is not necessarily straightforward. This is only safe if the language guarantees that the in-memory representation of the array is “unboxed”; meaning it corresponds to what C expects. If the language does not guarantee this, you need to allocate raw memory, ask Futhark to copy the array to that location, and then copy the elements into the bridge language array - which can be quite slow.
The
values
function is allowed to copy asynchronously, meaning the copy might still be physically ongoing when the function returns. This allows overlapping copies with other work, which can be important for performance. Unfortunately, many of the potential bridge languages use garbage collection, and most garbage collectors do not guarantee that objects do not move around in memory. If an array was moved whilst Futhark was copying data into it, the results would likely be disastrous. Some languages allow one to allocate non-movable “pinned” memory, but unless great care is taken, it is best to generate glue code that performs a full synchronisation after calling thevalues
function.
Another minor thing, but which turns out to be a big convenience, is that the Futhark compiler will emit a manifest (a JSON file) that describes the entire generated API in a machine-readable way. This is useful, as which functions are available depends on which entry points are defined in the Futhark program, and which types they expose. Prior to adding the manifest, bridges had to analyse the generated C header file, which is awkward and error-prone.
Futhark’s C API reports errors via return codes, following the usual 0-means-success convention. This is notoriously error prone and tedious to handle in C, but at least it is very easy to wrap in whichever error handling facility is customary in the bridge language, such as exceptions.
Memory management is a more tricky business. Futhark does not allow the construction of circular data structures and thus internally uses reference counting, but it has no way of knowing when the user is done with some piece of data. Therefore the user is responsible for eventually freeing all data returned by Futhark, using a Futhark-provided function (that in practice just decrements a reference count). While C programmers are famously known for never making mistakes when doing manual memory management, programmers of more high level languages are more imperfect, and therefore it is advisable for a bridge to hook into the automatic memory management (reference counting, finalizers) of the bridge language.
Implmentation of smlfut
The Futhark/SML-bridge has been implemented as a program smlfut. You pass it a Futhark manifest file and it will spit out SML code as well as a bit of glue C code. I had hoped to avoid the need for generating C, but I needed to do a few (uninteresting) things that could not be expressed directly in SML.
SML has a very nice module
system
that allows one to write a signature that abstractly describes the
interface implemented by a module. Smlfut
makes good use of this.
For example, the Futhark program
def inc (xs: []i32) = map (+2) xs
currently gives rise to this collection of SML signatures:
signature FUTHARK_POLY_ARRAY =
sig
type array
type ctx
type shape
type elem
val new: ctx -> elem ArraySlice.slice -> shape -> array
val free: array -> unit
val shape: array -> shape
val values: array -> elem Array.array
val values_into: array -> elem ArraySlice.slice -> unit
end
signature FUTHARK_OPAQUE =
sig
type t
type ctx
val free : t -> unit
end
signature FUTHARK_RECORD =
sig
include FUTHARK_OPAQUE
type record
val values : t -> record
val new : ctx -> record -> t
end
signature FUTHARK = sig
val backend : string
val version : string
type ctx
exception error of string
type cfg = {logging:bool, debugging:bool, profiling:bool, cache:string option}
structure Config : sig
val default : cfg
val logging : bool -> cfg -> cfg
val debugging : bool -> cfg -> cfg
val profiling : bool -> cfg -> cfg
val cache : string option -> cfg -> cfg
end
structure Context : sig
val new : cfg -> ctx
val free : ctx -> unit
val sync : ctx -> unit
end
(* []i32 *)
structure Int32Array1 : FUTHARK_POLY_ARRAY
where type ctx = ctx
and type shape = int
and type elem = Int32.int
structure Opaque : sig
end
structure Entry : sig
val inc : ctx -> Int32Array1.array -> Int32Array1.array
end
end
The particularly interested can read the fine
manual,
but the most interesting thing is that Futhark arrays must be
associated with some SML type, here the built-in polymorphic array
type, such that Futhark arrays can be converted to SML-comprehensible
data. More on that design decision in a bit.
It is also interesting (for people who find that sort of thing
interesting) that identifiers from the Futhark program can influence
identifiers in the SML program; in this case, the name of the entry
point inc
. What happens if a Futhark entry point has a name that is
not a valid SML identifier? For smlfut
, I have decided to emit an
error. Users can always pick better public names, and this is better
than coming up with complicated escaping rules. (This rule of course
does not apply to names that are not visible in the API.)
Compiler support
Since SML has so many compilers in use, my initial plan was to support
several of them. However, due to unexpected technical friction, only
MLton (and its fork MPL) is currently
supported. The main obstacle is that while the SML Basis
Library
defines an interface for monomorphic
arrays, which I
would expect implies an unboxed representation, only MLton actually
seems to implement this interface for all necessary primitive types.
For example, MLKit (generally my favourite SML compiler) only
implements monomorphic arrays for the types Char8
(uint8_t
) and
Real
(double
). SML/NJ, which generally has good support for the
Basis Library (they invented it), also has a rather anemic selection
of monomorphic arrays. This is ironic because in MLton, polymorphic
arrays also have an unboxed monomorphic representation, so I don’t
actually need to use monomorphic array types in order to access their
elements from C. Since smlfut
already has a command line option for
switching between monomorphic and polymorphic array interfaces, I am
considering adding a switch that allows one to copy Futhark arrays
into Char8
monomorphic arrays, and make it to to the user to
interpret the raw bytes as proper numbers.
While MLton’s FFI is somewhat basic, the Futhark C API was designed to accomodate this, and I did not encounter any real obstacles. I did not get very far with the MLKit implementation, but this was due to array type challenges, not the FFI.
Resource management
While MLton does support finalizers, neither MLKit nor MPL does. As
a result, smlfut
requires manual resource management by the user.
There is no nice way to put it: this is a major downside. I hope to
come up with a solution somehow, but beyond simply using finalizers
when available, I’m not sure what to do.
What does it feel like to smlfut
?
It feels good, if somewhat verbose. This is a small chunk of SML code that makes use of the Futhark program above:
val ctx =
Futhark.Context.new Futhark.Config.defaultval arr_in =
1, 2, 3])) 3
Futhark.Int32Array1.new ctx (ArraySlice.full (Array.fromList [val arr_out =
Futhark.Entry.inc ctx arr_inval arr_sml =
Futhark.Int32Array1.values arr_outval () =
Futhark.Int32Array1.free arr_inval () =
Futhark.Int32Array1.free arr_outval () =
Futhark.Context.free ctx
And in particular, it has already proven useful for some work I’ve been invited to participate in. More on that in the future, hopefully.