Maximizing communication performance via automatic optimization of thread mapping and communication routing

Ahmed Hamdy Ibrahim Abdel-Gawad, Purdue University


The orchestration of communication of distributed memory parallel applications on a parallel computer system may be viewed as the challenge of matching an application graph (where nodes represent computation and edges represent communication) to a computer systems topology graph (where nodes are processors and edges are network links). Such a matching may be viewed as having two stages: (1) mapping compute nodes on processor nodes, and (2) routing communication over the interconnection network links. Each of the above two stages has a significant impact on overall communication performance (which in turn, affects overall application performance). The key contributions of this thesis is designing optimal and/or scalable approaches for enhancing the communication performance for both small and large scale platforms. My first contribution targets small scale (e.g., 64 nodes or fewer) platforms by designing an optimal integrated approach to map-and-route application graphs on to topologies. Moreover, I show that further performance improvements for streaming applications (beyond those obtained by integrated map-and-route) are possible by transforming the application graph before mapping-and-routing the application graph on the topology graph. For large scale (e.g., 1000s of nodes) high performance computing platforms where scalability is the main focus, I develop an optimal routing approach that can be solved in polynomial time (in contrast with previous approaches that were NP-hard) as my second contribution. In my third contribution, I develop divide-and-conquer heuristics that selectively use optimal solutions to smaller subproblems and combine the solutions (again using heuristics) to achieve high quality MPI process mappings. For a suite of MPI benchmarks, I demonstrate the significant performance benefits of my routing and mapping techniques via a combination of simulation and real measurements on an IBM BlueGene/Q.




Thottethodi, Purdue University.

Subject Area

Computer Engineering|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server