QR factorization of a dense matrix on a shared-memory multiprocessor (Q1120250)
From MaRDI portal
| This is the item page for this Wikibase entity, intended for internal use and editing purposes. Please use this page instead for the normal view: QR factorization of a dense matrix on a shared-memory multiprocessor |
scientific article; zbMATH DE number 4100485
| Language | Label | Description | Also known as |
|---|---|---|---|
| English | QR factorization of a dense matrix on a shared-memory multiprocessor |
scientific article; zbMATH DE number 4100485 |
Statements
QR factorization of a dense matrix on a shared-memory multiprocessor (English)
0 references
1989
0 references
This paper describes a parallel algorithm for computing the QR- factorization of a dense matrix. The algorithm is specially designed to have low synchronization overhead, for use on a shared-memory multiprocessor. The algorithm can be implemented in both a synchronous and an asynchronous fashion. In the synchronous version, all processors synchronize with each other before each new annihilation step is started, while in the asynchronous version all the processors can proceed by themselves to compute their share of the entire process. The first version has the smallest synchronization cost, and is therefore suited for machines with high synchronization overhead, while the second version has smaller processor idle time. However, numerical experiments show that there is not much difference between the total execution time of the two implementations. The paper has a good discussion of the algorithm and its implementation. Included is also a thorough discussion of synchronization cost, work load distribution, and performance analysis. Finally, numerical experiments are presented which illustrate the superiority of this algorithm to a previous pipelined QR-algorithm by \textit{J. J. Dongarra}, \textit{A. H. Sameh} and \textit{D. C. Sorensen} [ibid. 3, 25-34 (1986; Zbl 0591.65027)].
0 references
Givens rotations
0 references
parallel algorithm
0 references
QR-factorization
0 references
shared-memory multiprocessor
0 references
synchronization
0 references
numerical experiments
0 references
performance analysis
0 references