Abstract
A non-comparison-based sort that sorts an array in O(k n) work
and O(k log(n)) span, where k is the number of bits in each element.
The library contains two variants of radix sort with different use
cases. blocked_radix_sort
should be used on large arrays,
if the array is small then it may be the case that
radix_sort
is faster. radix_sort
can also be much
more suitable in cases where nested parallelism is utilized.
Generally, this is the sorting function we recommend for Futhark
programs, but be careful about negative integers (use
radix_sort_int
) and floating-point numbers (use
radix_sort_float
). If you need a comparison-based sort,
consider merge_sort
.
Synopsis
val radix_sort | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val with_indices | [n] 'a : | (xs: [n]a) -> [n](a, i64) |
val radix_sort_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_int | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_int_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_float | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_float_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort_int | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort_int_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort_float | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val blocked_radix_sort_float_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
Description
- ↑val radix_sort [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
The
num_bits
andget_bit
arguments can be taken from one of the numeric modules of module typeintegral
orfloat
, such asi32
orf64
. However, if you know that the input array only uses lower-order bits (say, if all integers are less than 100), then you can profitably pass a smallernum_bits
value to reduce the number of sequential iterations.Warning: while radix sort can be used with numbers, the bitwise representation of of both integers and floats means that negative numbers are sorted as greater than non-negative. Negative floats are further sorted according to their absolute value. For example, radix-sorting
[-2.0, -1.0, 0.0, 1.0, 2.0]
will produce[0.0, 1.0, 2.0, -1.0, -2.0]
. Useradix_sort_int
andradix_sort_float
in the (likely) cases that this is not what you want.- ↑val with_indices [n] 'a: (xs: [n]a) -> [n](a, i64)
- ↑val radix_sort_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort
, but sort based on key function.- ↑val radix_sort_int [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sort
that ensures negative integers are sorted as expected. Simply pass the usualnum_bits
andget_bit
definitions from e.g.i32
.- ↑val radix_sort_int_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int
, but sort based on key function.- ↑val radix_sort_float [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sort
that ensures floats are sorted as expected. Simply pass the usualnum_bits
andget_bit
definitions fromf32
andf64
.- ↑val radix_sort_float_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float
, but sort based on key function.- ↑val blocked_radix_sort [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
This implementation of radix sort is based on an algorithm which splits the input up into blocks, sorts them and collect them [1]. This leads to performance gains if you choose a good
block
size based on the GPU thread block. Using 256 as theblock
size to sort 100 million i32 with an A100 GPU leads to a 2x speedup compared toradix_sort
. The sorting algorithm is stable and its work is O(k n) and the span is O(k log(n)) where k is the number of bits in the elements being sorted. In the analysis of the asymptotics we assume theblock
size is some constant.[1] N. Satish, M. Harris and M. Garland, "Designing efficient sorting algorithms for manycore GPUs," 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, 2009, pp. 1-10, doi: 10.1109/IPDPS.2009.5161005.
- ↑val blocked_radix_sort_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_key
but blocked.- ↑val blocked_radix_sort_int [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_int
but blocked.- ↑val blocked_radix_sort_int_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int_by_key
but blocked.- ↑val blocked_radix_sort_float [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float
but blocked.- ↑val blocked_radix_sort_float_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float_by_key
but blocked.