Implementations of QR
the moment, these are not very fast for single large matrices, but
they are serviceable. Performance is quite good on "batches" on
many smaller matrices (i.e. when you
map QR decomposition), where
"small" is less than 16x16 or 32x16.
Much of this code is based on work by Kasper Unn Weihe, Kristian Quirin Hansen, and Peter Kanstrup Larsen. See their report for details.