FOLLOWUS
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
[ "Yi-shui LI, E-mail: liyishui_lys@163.com" ]
[ "Xin-hai CHEN, chenxinhai1995@aliyun.com" ]
Jie LIU, liujie@nudt.edu.cn
Published:2020-06,
Received:09 February 2019,
Revised:09 May 2020,
Scan QR Code
YI-SHUI LI, XIN-HAI CHEN, JIE LIU, et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. [J]. Frontiers of information technology & electronic engineering, 2020, 21(6): 939-949.
YI-SHUI LI, XIN-HAI CHEN, JIE LIU, et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. [J]. Frontiers of information technology & electronic engineering, 2020, 21(6): 939-949. DOI: 10.1631/FITEE.1900075.
随着应用程序规模和超级计算机体系结构复杂性的迅速增加,拓扑映射的重要性愈加凸显。高通信成本已成为超级计算机上运行的应用程序性能的主要限制因素。为避免不合适的映射策略可能带来的较差通信性能,提出一种启发式优化拓扑感知映射算法(OHTMA)。该算法旨在最小化用于测量映射结果的字节跳跃度量。OHTMA结合贪婪启发式算法和基于对交换的优化方法,减少了远程通信数量,有效增强了通信局部性。在天河三号E级原型机的实验结果表明,OHTMA算法可显著降低通信成本。
With the rapid increase of the size of applications and the complexity of the supercomputer architecture
topology-aware process mapping becomes increasingly important. High communication cost has become a dominant constraint of the performance of applications running on the supercomputer. To avoid a bad mapping strategy which can lead to terrible communication performance
we propose an optimized heuristic topology-aware mapping algorithm (OHTMA). The algorithm attempts to minimize the hop-byte metric that we use to measure the mapping results. OHTMA incorporates a new greedy heuristic method and pair-exchange-based optimization. It reduces the number of long-distance communications and effectively enhances the locality of the communication. Experimental results on the Tianhe-3 exascale supercomputer prototype indicate that OHTMA can significantly reduce the communication costs.
高性能计算拓扑映射启发式算法
High-performance computingTopology mappingHeuristic algorithm
T Agarwal, , , A Sharma, , , A Laxmikant, , , 等. . Topologyaware task mapping for reducing communication contention on large parallel machines. . Proc 20th IEEE Int Parallel & Distributed Processing Symp, , 2006. . p.1--10. . DOI:10.1109/IPDPS.2006.1639379http://doi.org/10.1109/IPDPS.2006.1639379..
DH Bailey, , , E Barszcz, , , JT Barton, , , 等. . The NAS parallel benchmarks-summary and preliminary results. . Proc ACM/IEEE Conf on Supercomputing, , 1991. . p.158--165. . DOI:10.1145/125826.125925http://doi.org/10.1145/125826.125925..
A Bhatele. . Automating Topology Aware Mapping for Supercomputers. . PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, USA, , 2010. ..
A Bhatele, , , V Laxmikant. . An evaluative study on the effect of contention on message latencies in large supercomputers. . Proc IEEE Int Symp on Parallel & Distributed Processing, , 2009. . p.1--8. . DOI:10.1109/IPDPS.2009.5161094http://doi.org/10.1109/IPDPS.2009.5161094..
B Brandfass, , , T Alrutz, , , T Gerhold. . Rank reordering for MPI communication optimization. . Comput Fluid, , 2013. . 80372--380. . DOI:10.1016/j.compfluid.2012.01.019http://doi.org/10.1016/j.compfluid.2012.01.019..
X Chen, , , J Liu, , , S Li, , , 等. . TAMM: a new topologyaware mapping method for parallel applications on the Tianhe-2A supercomputer. . Proc 18th Int Conf on Algorithms and Architectures for Parallel Processing, , 2018. . p.242--256. . DOI:10.1007/978-3-030-05051-1_17http://doi.org/10.1007/978-3-030-05051-1_17..
M Deveci, , , K Kaya, , , B Uar, , , 等. . Fast and high quality topology-aware task mapping. . Proc IEEE Int Parallel and Distributed Processing Symp, , 2015. . p.197--206. . DOI:10.1109/IPDPS.2015.93http://doi.org/10.1109/IPDPS.2015.93..
T Hoefler, , , M Snir. . Generic topology mapping strategies for large-scale parallel architectures. . Proc Int Conf on Supercomputing, , 2011. . p.75--84. . DOI:10.1145/1995896.1995909http://doi.org/10.1145/1995896.1995909..
T Hoefler, , , E Jeannot, , , G Mercier. . An overview of topology mapping algorithms and techniques in highperformance computing. In: Jeannot E, ilinskas J (Eds.), High-Performance Computing on Complex Environments. . Wiley, Hoboken, New Jersey, USA, , 2014. . DOI:10.1002/9781118711897.ch5http://doi.org/10.1002/9781118711897.ch5..
E Jeannot, , , G Mercier. . Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D'Ambra P, Guarracino M, Talia D (Eds.), Euro-Par 2010 Parallel Processing. . Springer Berlin Heidelberg, Germany, , 2010. . p.199--210. . DOI:10.1007/978-3-642-15291-7_20http://doi.org/10.1007/978-3-642-15291-7_20..
E Jeannot, , , G Mercier, , , F Tessier. . Process placement in multicore clusters: algorithmic issues and practical techniques. . IEEE Trans Parall Distrib Syst, , 2014. . 25((4):):993--1002. . DOI:10.1109/TPDS.2013.104http://doi.org/10.1109/TPDS.2013.104..
G Karypis, , , V Kumar. . METIS-A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Ordering of Sparse Matrices. . Technical Report, University of Minnesota, Minneapolis, USA, , 1998. ..
X Liao, , , Z Pang, , , K Wang, , , 等. . High performance interconnect network for Tianhe system. . J Comput Sci Technol, , 2015. . 30((2):):259--272. . DOI:10.1007/s11390-015-1520-7http://doi.org/10.1007/s11390-015-1520-7..
G Mercier, , , J Clet-Ortega. . Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo M, Westerholm J, Dongarra J (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface.. . Springer Berlin Heidelberg, Germany, , 2009. . p.104--115. . DOI:10.1007/978-3-642-03770-2_17http://doi.org/10.1007/978-3-642-03770-2_17..
SH Mirsadeghi, , , A Afsahi. . PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. . Proc IEEE Int Parallel and Distributed Processing Symp Workshops, , 2016. . p.386--396. . DOI:10.1109/IPDPSW.2016.146http://doi.org/10.1109/IPDPSW.2016.146..
F Pellegrini, , , J Roman. . SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. . Proc Int Conf and Exhibition on High-Performance Computing and Networking, , 1996. . p.493--498. . DOI:10.1007/3-540-61142-8_588http://doi.org/10.1007/3-540-61142-8_588..
E Rodrigues, , , F Madruga, , , P Navaux, , , 等. . Multi-core aware process mapping and its impact on communication overhead of parallel applications. . Int Symp on Computers and Communications, , 2009. . p.811--817. . DOI:10.1109/ISCC.2009.5202271http://doi.org/10.1109/ISCC.2009.5202271..
S Sahni, , , T Gonzalez. . P-complete approximation problems. . J ACM, , 1976. . 23((3):):555--565. . DOI:10.1145/321958.321975http://doi.org/10.1145/321958.321975..
CD Sudheer, , , A Srinivasan. . Optimization of the hopbyte metric for effective topology aware mapping. . Proc 19th Int Conf on High Performance Computing, , 2012. . p.1--9. . DOI:10.1109/HiPC.2012.6507513http://doi.org/10.1109/HiPC.2012.6507513..
O Tuncer, , , VJ Leung, , , AK Coskun. . PaCMap: topology mapping of unstructured communication patterns onto non-contiguous allocations. . Proc 29th ACM on Int Conf on Supercomputing, , 2015. . p.37--46. . DOI:10.1145/2751205.2751225http://doi.org/10.1145/2751205.2751225..
C Walshaw, , , M Cross. . JOSTLE-parallel multilevel graph-partitioning software: an overview. In: Magouls F (Ed.), Mesh Partitioning Techniques and Domain Decomposition Methods. . Saxe-Coburg Publications, Stirlingshire, UK, , 2007. . p.22--58. . DOI:10.4203/csets.17.2http://doi.org/10.4203/csets.17.2..
T Wang, , , P Qing, , , D Wei, , , 等. . Optimization of process-to-core mapping based on clustering analysis. . Chin J Comput, , 2015. . 38((5):):1044--1055. . ..
BJN Wylie, , , D Bhme, , , B Mohr, , , 等. . Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset. . Proc IEEE Int Symp on Parallel & Distributed Processing, Workshops and PhD Forum, , 2010. . p.1--8. . DOI:10.1109/IPDPSW.2010.5470816http://doi.org/10.1109/IPDPSW.2010.5470816..
RJ Zerr, , , RS Baker. . Snap: SN (Discrete Ordinates) Application Proxy-Proxy Description. . Technical Report, LA-UR-13-21070, Los Alamos National Laboratory, Los Alamos, USA, , 2013. ..
Publicity Resources
Related Articles
Related Author
Related Institution