Designing a Programming Language for the Desert

Posted on June 18, 2018 by Troels Henriksen

Futhark will never be a popular language. Futhark is intended for a niche - developing the small performance-critical core of larger programs - that few programmers ever need to touch. And even for those programmers who do, it will likely not be the majority of their programming. Thus, even were Futhark to become the indisputably best language in its area - which is certainly the goal! - the user base would still be small, and so would the development resources. This post explains some of the design decisions we have made in this context. The overall idea is a bit like trying to thrive in the desert - you can do it, even indefinitely, but you have to respect the realities of the environment, in particular with respect to not overextending or exposing yourself. Fair warning, however: everything I know about desert survival comes from reading Dune. Fairly sure you don’t have to watch out for sandworms in a real desert.

Limitations on the Compiler

We can probably assume that the Futhark development team will always remain fairly small (although we certainly hope to grow it, and we currently invite both open source contributions as well as hire PhD students - but that’s a story for another day). As a result, we have a strong focus on keeping the compiler maintainable, in particular by not adding too many maintenance-heavy features that increase our exposure to external factors. For example, it is easy to add new compiler backends to Futhark. We have exploited this to add a very nice python backend in addition to the one for C, and we are currently working on a similar one for C#. It would be a relatively small matter to add similar backends for R, Ruby, Java, Swift, Rust, and so forth. Yet for each of these backends we would have to set up testing and benchmarking infrastructure to ensure they remain operational, even in the face of changes in the library landscape of the target landscape (for example, defects or modifications made to way we access OpenCL). Over-extension here would result in either spending all our resources on maintenance of perhaps rarely used features, or else simply producing a buggy compiler.

Another example of our design philosophy is the small number of configuration options exposed by the Futhark compiler. Compare the futhark-opencl manpage with the one for GCC. Also, none of the futhark-opencl options affect the compilation process itself. This is partially a matter of aesthetics (I like programs that just do the right thing), but also a pragmatic engineering decision. Each configuration option multiplies the number of distinct paths through the compiler, and hence the number of places bugs may manifest themselves. A less configurable optimiser is simpler, and therefore easier to maintain.

Limitations on Users

The limitations brought about by limited compiler development resources are fairly obvious. The design choices made to accommodate limited user resources are more subtle. It is a subject for which I have not been able to find much prior coverage. Most new languages are general-purpose, and while they start out obscure, they are often built with the hope (or fantasy) of growing large and successful, and being used for full systems. In a way, they are temporarily embarrassed millionaires, and will prosper if the right investments and risks are made. Their context is a famine, which will pass. But Futhark must prosper in the desert, and the desert is eternal.

As a starting point, we probably cannot assume that Futhark will be the primary language of most of its users, and so we must limit the amount of pitfalls and non-obvious behaviour (the principle of least astonishment). In the desert, resources must not be wasted. However, this does not mean that we are primarily concerned with novice programmers. In fact, we can assume that that only programmers of above average skill and inclination will want to use Futhark. Further, effective use of Futhark will still require users to think about parallel programming, which is by itself a rare skill (albeit easily learned). We also do not compromise on language design decisions necessary to obtain good performance, since that’s the entire point of a language like Futhark in the first place.

Fortunately, we are well placed in this regard. One of the original design principles behind Futhark was that parallel functional programming should not depend on exotic novel language constructs, but rather arise naturally from the conventional functional vocabulary of higher-order functions (map, reduce, etc). The complexity should be in the compiler, not in the language. This means that prior experience in a common functional language is easily translated to knowledge of Futhark, although some of the restrictions and quirks must of course be learnt.

As Futhark is not suitable for full application programming, a Futhark program will usually be a guest in a larger code base. Polite guests do not rearrange the furniture of their host, and similarly, Futhark should not make arduous demands of whatever build system is already in place. Therefore, Futhark has been designed such that compilation is always done simply by launching the compiler on the Futhark file that contains the entry points. There is no need for fiddling with include paths (none are supported), or interacting with some Futhark-specific build system or project configuration file. There are no hidden caches either: the only file(s) created are in the directory containing the output file. This means there is no hidden state or database that can be corrupted, or must be periodically cleaned.

This notion of zero-configurability is not some “obviously desirable” property whose inclusion should be a no-brainer in any language. It requires a real trade-off in flexibility and ergonomics - a trade-off with costs that we believe are worth it for Futhark, but are likely too steep for other languages.

As an example, let us consider how Futhark splits programs over multiple files. A file foo.fut can refer to definitions bar.fut by saying import "bar". The imported file is found relative to the path of the importing file. For example, if the file dir1/dir2/foo.fut contains import "bar", the compiler will look for dir1/dir2/bar.fut, and import "../bar" would be dir1/bar.fut.

This is an unusual design for modern languages (except for scripting languages), where imports are usually resolved relative to some import path that is the same for all files. For example, Haskell uses absolute imports and assumes that the module Foo.Bar.Baz is found in a file Foo/Bar/Baz.hs, relative to an import path that generally does not contain the directory containing the importing file (but usually contains either the working directory of the compiler, or the directory of some root file). The advantage of absolute names is that a file will always be referenced by the same name, no matter where it is imported. With Futhark’s relative imports, the programmer has to construct a path to the desired file. This can make it harder to understand very large programs, especially if it contains multiple files with the same base name.

One advantage of relative importing is conceptual simplicity. Understanding imports requires nothing more than understanding a hierarchical file system in the first place. This means there is less to learn. However, the main advantage is probably composability. If a Futhark program type-checks, then any file making up that program can also be type checked by itself, simply by passing it directly to the compiler. The practical impact is that it becomes low-effort to write and use tools. For example, the flycheck definition in futhark-mode is less than twenty lines of code, and will work automatically, with no need to read configuration files or guess some “compilation root” or whatnot. C resembles this design a bit, since #include "foo.h" is relative, but in practice it is common to use the -I option to the compiler to expand the include path to some “root”, and treat imports as absolute.

A major downside of relative imports combined with no include path configuration, occurs when using third party libraries. The library files must be physically present on the file system relatively close to the user’s program. However, due to how relative imports work, a library mylib can be incorporated simply by copying its source files into a mylib directory next to the user program, and then using import "mylib/foo". We hope that in the future some automated tooling can be built on top of this idea, to automatically fetch and update the code from some central package database, but it remains unclear exactly how it will work. Also, it is not yet clear how to write Futhark libraries that depend on other libraries, short of doing vendoring in the Go sense of the word (basically, bundle whatever libraries you need), which comes with various serious downsides (particularly code bloat), as can be observed in the Javascript ecosystem.

Summing up

Futhark is not a deer browsing in a lush forest with nary a care in the world. Rather, Futhark is a grumpy desert hedgehog, bristling with spikes, thriving even under the limitations of its harsh environment, and spending its nights hunting scorpions.