futhark-0.11.0: An optimising compiler for a functional, array-oriented language.

Safe Haskell None Haskell2010

Futhark.CodeGen.ImpGen.Kernels.SegGenRed

Description

Our compilation strategy for SegGenRed is based around avoiding bin conflicts. We do this by splitting the input into chunks, and for each chunk computing a single subhistogram. Then we combine the subhistograms using an ordinary segmented reduction (SegRed).

There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.

We also use a heuristic strategy for computing subhistograms in local memory when possible. Given:

H: total size of histograms in bytes, including any lock arrays.

G: group size

T: number of bytes of local memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).

LMAX: maximum amount of local memory per workgroup (hard limit).

We wish to compute:

COOP: cooperation level (number of threads per subhistogram)

LH: number of local memory subhistograms

We do this as:

COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use local memory else use global memory