Implementations of QR decomposition. At the moment, these are not very fast for single large matrices, but they are serviceable. Performance is quite good on "batches" on many smaller matrices (i.e. when you map QR decomposition), where "small" is less than 16x16 or 32x16.

Much of this code is based on work by Kasper Unn Weihe, Kristian Quirin Hansen, and Peter Kanstrup Larsen. See their report for details.


module mk_block_householder: (T: ordered_field) -> {
val qr [m] [n]: (block_size: i64) -> (A: [m][n]T.t) -> ([m][m]T.t, [m][n]T.t)
module mk_gram_schmidt: (T: ordered_field) -> {
val qr [m] [n]: (A: [m][n]T.t) -> ([m][m]T.t, [m][n]T.t)


module mk_block_householder

QR decomposition via the blocked Householder transform. The block size affects performance, although usually only slightly. Use 16 for a reasonable default. At the moment, the input size must be a multiple of the block size.

module mk_gram_schmidt

QR decomposition with the Gram-Schmidt process. Note: Very numerically unstable.