QR factorization of a dense matrix on a shared-memory multiprocessor (Q1120250)

This paper describes a parallel algorithm for computing the QR- factorization of a dense matrix. The algorithm is specially designed to have low synchronization overhead, for use on a shared-memory multiprocessor. The algorithm can be implemented in both a synchronous and an asynchronous fashion. In the synchronous version, all processors synchronize with each other before each new annihilation step is started, while in the asynchronous version all the processors can proceed by themselves to compute their share of the entire process. The first version has the smallest synchronization cost, and is therefore suited for machines with high synchronization overhead, while the second version has smaller processor idle time. However, numerical experiments show that there is not much difference between the total execution time of the two implementations. The paper has a good discussion of the algorithm and its implementation. Included is also a thorough discussion of synchronization cost, work load distribution, and performance analysis. Finally, numerical experiments are presented which illustrate the superiority of this algorithm to a previous pipelined QR-algorithm by \textit{J. J. Dongarra}, \textit{A. H. Sameh} and \textit{D. C. Sorensen} [ibid. 3, 25-34 (1986; Zbl 0591.65027)].

0 references

zbMATH Keywords

Givens rotations

0 references

parallel algorithm

0 references

QR-factorization

0 references

shared-memory multiprocessor

0 references

synchronization

0 references