Abstract
A non-comparison-based sort that sorts an array in O(k n) work and O(k log(n)) span, where k is the number of bits in each element.
Generally, this is the sorting function we recommend for Futhark
programs, but be careful about negative integers (use
radix_sort_int
) and floating-point numbers (use
radix_sort_float
). If you need a comparison-based sort,
consider merge_sort
.
Synopsis
val radix_sort | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val with_indices | [n] 'a : | (xs: [n]a) -> [n](a, i64) |
val radix_sort_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_int | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_int_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_float | [n] 't : | (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val radix_sort_float_by_key | [n] 't 'k : | (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort | [n] 't : | (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort_by_key | [n] 't 'k : | (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort_int | [n] 't : | (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort_int_by_key | [n] 't 'k : | (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort_float | [n] 't : | (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t |
val chunked_radix_sort_float_by_key | [n] 't 'k : | (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t |
Description
- ↑val radix_sort [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
The
num_bits
andget_bit
arguments can be taken from one of the numeric modules of module typeintegral
orfloat
, such asi32
orf64
. However, if you know that the input array only uses lower-order bits (say, if all integers are less than 100), then you can profitably pass a smallernum_bits
value to reduce the number of sequential iterations.Warning: while radix sort can be used with numbers, the bitwise representation of of both integers and floats means that negative numbers are sorted as greater than non-negative. Negative floats are further sorted according to their absolute value. For example, radix-sorting
[-2.0, -1.0, 0.0, 1.0, 2.0]
will produce[0.0, 1.0, 2.0, -1.0, -2.0]
. Useradix_sort_int
andradix_sort_float
in the (likely) cases that this is not what you want.- ↑val with_indices [n] 'a: (xs: [n]a) -> [n](a, i64)
- ↑val radix_sort_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort
, but sort based on key function.- ↑val radix_sort_int [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sort
that ensures negative integers are sorted as expected. Simply pass the usualnum_bits
andget_bit
definitions from e.g.i32
.- ↑val radix_sort_int_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int
, but sort based on key function.- ↑val radix_sort_float [n] 't: (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
A thin wrapper around
radix_sort
that ensures floats are sorted as expected. Simply pass the usualnum_bits
andget_bit
definitions fromf32
andf64
.- ↑val radix_sort_float_by_key [n] 't 'k: (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float
, but sort based on key function.- ↑val chunked_radix_sort [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
This implementation of radix sort works on the outside almost like
radix_sort
but the implementation is based on a design where you chunk the input into subarrays [1] and sort them. This leads to performance gains if you choose a goodchunk
size based on the GPU thread block. Using 512 as achunk
size leads to about a 1.5x speedup compared to the normalradix_sort
. The sorting algorithm is stable and its work is O(k n) and the span is O(k log(n)) where k is the number of bits in the elements being sorted. In the analysis of the asymptotics we assume thechunk
size is some constant in the analysis.[1] N. Satish, M. Harris and M. Garland, "Designing efficient sorting algorithms for manycore GPUs," 2009 IEEE International Symposium on Parallel & Distributed Processing, Rome, Italy, 2009, pp. 1-10, doi: 10.1109/IPDPS.2009.5161005.
- ↑val chunked_radix_sort_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_key
but chunked.- ↑val chunked_radix_sort_int [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_by_int
but chunked.- ↑val chunked_radix_sort_int_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_int_by_key
but chunked.- ↑val chunked_radix_sort_float [n] 't: (chunk: i16) -> (num_bits: i32) -> (get_bit: i32 -> t -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float
but chunked.- ↑val chunked_radix_sort_float_by_key [n] 't 'k: (chunk: i16) -> (key: t -> k) -> (num_bits: i32) -> (get_bit: i32 -> k -> i32) -> (xs: [n]t) -> [n]t
Like
radix_sort_float_by_key
but chunked.