OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Yi-shui LI; Xin-hai CHEN; Jie LIU; Bo YANG; Chun-ye GONG; Xin-biao GAN; Sheng-guo LI; Han XU

doi:10.1631/FITEE.1900075

Your Location：

Home >

Browse articles >

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

Regular Papers | Updated：2022-05-19

- OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype
  Enhanced Publication
- OHTMA：面向天河三号E级原型机的一种启发式优化拓扑感知映射算法
- Frontiers of Information Technology & Electronic Engineering Vol. 21, Issue 6, Pages: 939-949(2020)
- Affiliations：
  
  Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073, China
- Author bio：
  
  [ "Yi-shui LI, E-mail: liyishui_lys@163.com" ]
  [ "Xin-hai CHEN, chenxinhai1995@aliyun.com" ]
  Jie LIU, liujie@nudt.edu.cn
- Funds：
- DOI：10.1631/FITEE.1900075
  CLC： TP319
- Published：2020-06，
  
  Received：09 February 2019，
  
  Revised：09 May 2020，
- Accepted：
Scan QR Code
YI-SHUI LI, XIN-HAI CHEN, JIE LIU, et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. [J]. Frontiers of information technology & electronic engineering, 2020, 21(6): 939-949.
DOI：

YI-SHUI LI, XIN-HAI CHEN, JIE LIU, et al. OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype. [J]. Frontiers of information technology & electronic engineering, 2020, 21(6): 939-949. DOI： 10.1631/FITEE.1900075.

摘要

随着应用程序规模和超级计算机体系结构复杂性的迅速增加，拓扑映射的重要性愈加凸显。高通信成本已成为超级计算机上运行的应用程序性能的主要限制因素。为避免不合适的映射策略可能带来的较差通信性能，提出一种启发式优化拓扑感知映射算法（OHTMA）。该算法旨在最小化用于测量映射结果的字节跳跃度量。OHTMA结合贪婪启发式算法和基于对交换的优化方法，减少了远程通信数量，有效增强了通信局部性。在天河三号E级原型机的实验结果表明，OHTMA算法可显著降低通信成本。

Abstract

With the rapid increase of the size of applications and the complexity of the supercomputer architecture

topology-aware process mapping becomes increasingly important. High communication cost has become a dominant constraint of the performance of applications running on the supercomputer. To avoid a bad mapping strategy which can lead to terrible communication performance

we propose an optimized heuristic topology-aware mapping algorithm (OHTMA). The algorithm attempts to minimize the hop-byte metric that we use to measure the mapping results. OHTMA incorporates a new greedy heuristic method and pair-exchange-based optimization. It reduces the number of long-distance communications and effectively enhances the locality of the communication. Experimental results on the Tianhe-3 exascale supercomputer prototype indicate that OHTMA can significantly reduce the communication costs.

关键词

高性能计算拓扑映射启发式算法

Keywords

High-performance computingTopology mappingHeuristic algorithm

references

T Agarwal, , , A Sharma, , , A Laxmikant, , , 等. . Topologyaware task mapping for reducing communication contention on large parallel machines. . Proc 20th IEEE Int Parallel & Distributed Processing Symp, , 2006. . p.1--10. . DOI:10.1109/IPDPS.2006.1639379http://doi.org/10.1109/IPDPS.2006.1639379..

DH Bailey, , , E Barszcz, , , JT Barton, , , 等. . The NAS parallel benchmarks-summary and preliminary results. . Proc ACM/IEEE Conf on Supercomputing, , 1991. . p.158--165. . DOI:10.1145/125826.125925http://doi.org/10.1145/125826.125925..

A Bhatele. . Automating Topology Aware Mapping for Supercomputers. . PhD Thesis, University of Illinois at Urbana-Champaign, Urbana, USA, , 2010. ..

A Bhatele, , , V Laxmikant. . An evaluative study on the effect of contention on message latencies in large supercomputers. . Proc IEEE Int Symp on Parallel & Distributed Processing, , 2009. . p.1--8. . DOI:10.1109/IPDPS.2009.5161094http://doi.org/10.1109/IPDPS.2009.5161094..

B Brandfass, , , T Alrutz, , , T Gerhold. . Rank reordering for MPI communication optimization. . Comput Fluid, , 2013. . 80372--380. . DOI:10.1016/j.compfluid.2012.01.019http://doi.org/10.1016/j.compfluid.2012.01.019..

X Chen, , , J Liu, , , S Li, , , 等. . TAMM: a new topologyaware mapping method for parallel applications on the Tianhe-2A supercomputer. . Proc 18th Int Conf on Algorithms and Architectures for Parallel Processing, , 2018. . p.242--256. . DOI:10.1007/978-3-030-05051-1_17http://doi.org/10.1007/978-3-030-05051-1_17..

M Deveci, , , K Kaya, , , B Uar, , , 等. . Fast and high quality topology-aware task mapping. . Proc IEEE Int Parallel and Distributed Processing Symp, , 2015. . p.197--206. . DOI:10.1109/IPDPS.2015.93http://doi.org/10.1109/IPDPS.2015.93..

T Hoefler, , , M Snir. . Generic topology mapping strategies for large-scale parallel architectures. . Proc Int Conf on Supercomputing, , 2011. . p.75--84. . DOI:10.1145/1995896.1995909http://doi.org/10.1145/1995896.1995909..

T Hoefler, , , E Jeannot, , , G Mercier. . An overview of topology mapping algorithms and techniques in highperformance computing. In: Jeannot E, ilinskas J (Eds.), High-Performance Computing on Complex Environments. . Wiley, Hoboken, New Jersey, USA, , 2014. . DOI:10.1002/9781118711897.ch5http://doi.org/10.1002/9781118711897.ch5..

E Jeannot, , , G Mercier. . Near-optimal placement of MPI processes on hierarchical NUMA architectures. In: D'Ambra P, Guarracino M, Talia D (Eds.), Euro-Par 2010 Parallel Processing. . Springer Berlin Heidelberg, Germany, , 2010. . p.199--210. . DOI:10.1007/978-3-642-15291-7_20http://doi.org/10.1007/978-3-642-15291-7_20..

E Jeannot, , , G Mercier, , , F Tessier. . Process placement in multicore clusters: algorithmic issues and practical techniques. . IEEE Trans Parall Distrib Syst, , 2014. . 25((4):):993--1002. . DOI:10.1109/TPDS.2013.104http://doi.org/10.1109/TPDS.2013.104..

G Karypis, , , V Kumar. . METIS-A Software Package for Partitioning Unstructured Graphs, Partitioning Meshes and Computing Fill-Reducing Ordering of Sparse Matrices. . Technical Report, University of Minnesota, Minneapolis, USA, , 1998. ..

X Liao, , , Z Pang, , , K Wang, , , 等. . High performance interconnect network for Tianhe system. . J Comput Sci Technol, , 2015. . 30((2):):259--272. . DOI:10.1007/s11390-015-1520-7http://doi.org/10.1007/s11390-015-1520-7..

G Mercier, , , J Clet-Ortega. . Towards an efficient process placement policy for MPI applications in multicore environments. In: Ropo M, Westerholm J, Dongarra J (Eds.), Recent Advances in Parallel Virtual Machine and Message Passing Interface.. . Springer Berlin Heidelberg, Germany, , 2009. . p.104--115. . DOI:10.1007/978-3-642-03770-2_17http://doi.org/10.1007/978-3-642-03770-2_17..

SH Mirsadeghi, , , A Afsahi. . PTRAM: a parallel topology-and routing-aware mapping framework for large-scale HPC systems. . Proc IEEE Int Parallel and Distributed Processing Symp Workshops, , 2016. . p.386--396. . DOI:10.1109/IPDPSW.2016.146http://doi.org/10.1109/IPDPSW.2016.146..

F Pellegrini, , , J Roman. . SCOTCH: a software package for static mapping by dual recursive bipartitioning of process and architecture graphs. . Proc Int Conf and Exhibition on High-Performance Computing and Networking, , 1996. . p.493--498. . DOI:10.1007/3-540-61142-8_588http://doi.org/10.1007/3-540-61142-8_588..

E Rodrigues, , , F Madruga, , , P Navaux, , , 等. . Multi-core aware process mapping and its impact on communication overhead of parallel applications. . Int Symp on Computers and Communications, , 2009. . p.811--817. . DOI:10.1109/ISCC.2009.5202271http://doi.org/10.1109/ISCC.2009.5202271..

S Sahni, , , T Gonzalez. . P-complete approximation problems. . J ACM, , 1976. . 23((3):):555--565. . DOI:10.1145/321958.321975http://doi.org/10.1145/321958.321975..

CD Sudheer, , , A Srinivasan. . Optimization of the hopbyte metric for effective topology aware mapping. . Proc 19th Int Conf on High Performance Computing, , 2012. . p.1--9. . DOI:10.1109/HiPC.2012.6507513http://doi.org/10.1109/HiPC.2012.6507513..

O Tuncer, , , VJ Leung, , , AK Coskun. . PaCMap: topology mapping of unstructured communication patterns onto non-contiguous allocations. . Proc 29th ACM on Int Conf on Supercomputing, , 2015. . p.37--46. . DOI:10.1145/2751205.2751225http://doi.org/10.1145/2751205.2751225..

C Walshaw, , , M Cross. . JOSTLE-parallel multilevel graph-partitioning software: an overview. In: Magouls F (Ed.), Mesh Partitioning Techniques and Domain Decomposition Methods. . Saxe-Coburg Publications, Stirlingshire, UK, , 2007. . p.22--58. . DOI:10.4203/csets.17.2http://doi.org/10.4203/csets.17.2..

T Wang, , , P Qing, , , D Wei, , , 等. . Optimization of process-to-core mapping based on clustering analysis. . Chin J Comput, , 2015. . 38((5):):1044--1055. . ..

BJN Wylie, , , D Bhme, , , B Mohr, , , 等. . Performance analysis of Sweep3D on Blue Gene/P with the Scalasca toolset. . Proc IEEE Int Symp on Parallel & Distributed Processing, Workshops and PhD Forum, , 2010. . p.1--8. . DOI:10.1109/IPDPSW.2010.5470816http://doi.org/10.1109/IPDPSW.2010.5470816..

RJ Zerr, , , RS Baker. . Snap: SN (Discrete Ordinates) Application Proxy-Proxy Description. . Technical Report, LA-UR-13-21070, Los Alamos National Laboratory, Los Alamos, USA, , 2013. ..

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

FTRP: a new fault tolerance framework using process replication and prefetching for high-performance computing

Exploring high-performance processor architecture beyond the exascale

Moving from exascale to zettascale computing: challenges and techniques

Related Author

Wei HU

Guang-ming LIU

Yan-huang JIANG

Xiang-hui XIE

Xun JIA

Xiang-ke LIAO

Kai LU

Can-qun YANG

Related Institution

College of Computer, National University of Defense Technology

National Supercomputer Center in Tianjin

State Key Laboratory of Mathematical Engineering and Advanced Computing

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰