The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of deep memory hierarchies, as opposed to minimizing raw FLOP counts. While direct solvers are reliable, they are often slow and memory-intensive for large problems. Iterative solvers, on the other hand, are more efficient but, in the absence of robust preconditioners, lack reliability. While preconditioners based on incomplete factorizations ( whenever they exist) are effective for many problems, their parallel scalability is generally limited. In this paper, we advocate the use of banded preconditioners instead and introduce a reordering strategy that enables their extraction. In contrast to traditional bandwidth reduction techniques, our reordering strategy takes into account the magnitude of the matrix entries, bringing the heaviest elements closer to the diagonal, thus enabling the use of banded preconditioners. When used with effective banded solvers-in our case, the Spike solver-we show that banded preconditioners (i) are more robust compared to the broad class of incomplete factorization-based preconditioners, (ii) deliver higher processor performance, resulting in faster time to solution, and (iii) scale to larger parallel configurations. We demonstrate these results experimentally on a large class of problems selected from diverse application domains.
Permuting Large Entries, Sparse Matrices, Algorithm, Indefinite, Scheme, Graphs, Computation, Reduction, Flows, Spike, spectral reordering, weighted reordering, banded preconditioner, incomplete factorization, Krylov subspace methods, parallel preconditioners
Date of this Version