Abstract

A non-comparison-based sort that sorts an array in O(k n) work and O(k log(n)) span, where k is the number of bits in each element. The library contains two variants of radix sort with different use cases. blocked_radix_sort should be used on large arrays, if the array is small then it may be the case that radix_sort is faster. radix_sort can also be much more suitable in cases where nested parallelism is utilized.

Generally, this is the sorting function we recommend for Futhark programs, but be careful about negative integers (use radix_sort_int) and floating-point numbers (use radix_sort_float). If you need a comparison-based sort, consider merge_sort.

Synopsis

val radix_sort [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val with_indices [n] 'a : (xs: [n]a) -> [n](a, i64)
val radix_sort_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_int [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_int_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_float [n] 't : (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val radix_sort_float_by_key [n] 't 'k : (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort [n] 't : (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort_by_key [n] 't 'k : (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort_int [n] 't : (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort_int_by_key [n] 't 'k : (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort_float [n] 't : (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
val blocked_radix_sort_float_by_key [n] 't 'k : (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Description

val radix_sort [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

The num_bits and get_bit arguments can be taken from one of the numeric modules of module type integral or float, such as i32 or f64. However, if you know that the input array only uses lower-order bits (say, if all integers are less than 100), then you can profitably pass a smaller num_bits value to reduce the number of sequential iterations.

Warning: while radix sort can be used with numbers, the bitwise representation of of both integers and floats means that negative numbers are sorted as greater than non-negative. Negative floats are further sorted according to their absolute value. For example, radix-sorting [-2.0, -1.0, 0.0, 1.0, 2.0] will produce [0.0, 1.0, 2.0, -1.0, -2.0]. Use radix_sort_int and radix_sort_float in the (likely) cases that this is not what you want.

val with_indices [n] 'a: (xs: [n]a) -> [n](a, i64)
val radix_sort_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort, but sort based on key function.

val radix_sort_int [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

A thin wrapper around radix_sort that ensures negative integers are sorted as expected. Simply pass the usual num_bits and get_bit definitions from e.g. i32.

val radix_sort_int_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_int, but sort based on key function.

val radix_sort_float [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

A thin wrapper around radix_sort that ensures floats are sorted as expected. Simply pass the usual num_bits and get_bit definitions from f32 and f64.

val radix_sort_float_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float, but sort based on key function.

val blocked_radix_sort [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

This implementation of radix sort is based on an algorithm which splits the input up into blocks, sorts them and collect them [1]. This leads to performance gains if you choose a good block size based on the GPU thread block. Using 256 as the block size to sort 100 million i32 with an A100 GPU leads to a 2x speedup compared to radix_sort. The sorting algorithm is stable and its work is O(k n) and the span is O(k log(n)) where k is the number of bits in the elements being sorted. In the analysis of the asymptotics we assume the block size is some constant.

[1] N. Satish, M. Harris and M. Garland, "Designing efficient sorting algorithms for manycore GPUs," 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, 2009, pp. 1-10, doi: 10.1109/IPDPS.2009.5161005.

val blocked_radix_sort_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_by_key but blocked.

val blocked_radix_sort_int [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_by_int but blocked.

val blocked_radix_sort_int_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_int_by_key but blocked.

val blocked_radix_sort_float [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float but blocked.

val blocked_radix_sort_float_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t

Like radix_sort_float_by_key but blocked.

See Also