Radix sort is a non-comparative sorting algorithm for sorting things that “behave” like numbers, in that they can be decomposed into digits. This makes it asymptotically more efficient than comparison-based sorts, and it also permits efficient GPU implementation.
The radix sort we will show here is very simple, and not as fast as it could be. The main reason is that it only processes a single digit at a time. Most high-performance implementations consider multiple bits at once. Also, we only sort 32-bit unsigned integers. Most of the work is done by a function
radix_sort_step xs b returns
xs with the elements sorted with respect to bit
To demonstrate how it works, suppose
-- xs = [2, 0, 2, 4, 2, 1, 5, 9] -- b = 1
First, for each element, we compute whether bit
b is set.
We will also need the negation (swapping 0s and 1s).
Compute how many elements do not have bit
For those elements that do not have bit
b set, we use a prefix sum (
scan (+) 0) to compute their 1-indexed positions in the final result. The elements that do have bit
b set are set to 0 by the
Similarly, compute the final positions for the elements that do have bit
b set - note that we also offset these by
idxs1 together. This will give a sensible result because they are never nonzero in the same position.
Our calculations have produced 1-indexed offsets, but Futhark arrays are 0-indexed, so decrement each element.
xs (just to have an array of the right size and type) and scatter the elements of
xs with the indexes we computed.
A full radix sort is then just sequentially looping through each bit position and apply the step function for each.
A useful optimisation is to first check the position of the most significant bit in each array element, and cap the number of iterations to that.