Abstract
A non-comparison-based sort that sorts an array in O(k n) work
and O(k log(n)) span, where k is the number of bits in each element.
The library contains two variants of radix sort with different use
cases. blocked_radix_sort should be used on large arrays,
if the array is small then it may be the case that
radix_sort is faster. radix_sort can also be much
more suitable in cases where nested parallelism is utilized.
Generally, this is the sorting function we recommend for Futhark
programs, but be careful about negative integers (use
radix_sort_int) and floating-point numbers (use
radix_sort_float). If you need a comparison-based sort,
consider merge_sort.
Synopsis
| val radix_sort | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val with_indices | [n] 'a : | (xs: [n]a) -> [n](a, i64) |
| val radix_sort_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
| val radix_sort_int | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val radix_sort_int_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
| val radix_sort_float | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val radix_sort_float_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort_int | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort_int_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort_float | [n] 't : | (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
| val blocked_radix_sort_float_by_key | [n] 't 'k : | (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
Description
- ↑val radix_sort [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
The
num_bitsandget_bitarguments can be taken from one of the numeric modules of module typeintegralorfloat, such asi32orf64. However, if you know that the input array only uses lower-order bits (say, if all integers are less than 100), then you can profitably pass a smallernum_bitsvalue to reduce the number of sequential iterations.Warning: while radix sort can be used with numbers, the bitwise representation of of both integers and floats means that negative numbers are sorted as greater than non-negative. Negative floats are further sorted according to their absolute value. For example, radix-sorting
[-2.0, -1.0, 0.0, 1.0, 2.0]will produce[0.0, 1.0, 2.0, -1.0, -2.0]. Useradix_sort_intandradix_sort_floatin the (likely) cases that this is not what you want.- ↑val with_indices [n] 'a: (xs: [n]a) -> [n](a, i64)
- ↑val radix_sort_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort, but sort based on key function.- ↑val radix_sort_int [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sortthat ensures negative integers are sorted as expected. Simply pass the usualnum_bitsandget_bitdefinitions from e.g.i32.- ↑val radix_sort_int_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int, but sort based on key function.- ↑val radix_sort_float [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sortthat ensures floats are sorted as expected. Simply pass the usualnum_bitsandget_bitdefinitions fromf32andf64.- ↑val radix_sort_float_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float, but sort based on key function.- ↑val blocked_radix_sort [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
This implementation of radix sort is based on an algorithm which splits the input up into blocks, sorts them and collect them [1]. This leads to performance gains if you choose a good
blocksize based on the GPU thread block. Using 256 as theblocksize to sort 100 million i32 with an A100 GPU leads to a 2x speedup compared toradix_sort. The sorting algorithm is stable and its work is O(k n) and the span is O(k log(n)) where k is the number of bits in the elements being sorted. In the analysis of the asymptotics we assume theblocksize is some constant.[1] N. Satish, M. Harris and M. Garland, "Designing efficient sorting algorithms for manycore GPUs," 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, 2009, pp. 1-10, doi: 10.1109/IPDPS.2009.5161005.
- ↑val blocked_radix_sort_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_keybut blocked.- ↑val blocked_radix_sort_int [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_intbut blocked.- ↑val blocked_radix_sort_int_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int_by_keybut blocked.- ↑val blocked_radix_sort_float [n] 't: (block: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_floatbut blocked.- ↑val blocked_radix_sort_float_by_key [n] 't 'k: (block: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float_by_keybut blocked.