FOLLOWUS
CAEP Software Center for High Performance Numerical Simulation, Beijing 100088, China
Institute of Applied Physics and Computational Mathematics, Beijing 100094, China
[ "Ze-yao MO, E-mail: zeyao_mo@iapcm.ac.cn" ]
纸质出版日期:2018-10,
收稿日期:2018-07-07,
修回日期:2018-10-15,
Scan QR Code
莫则尧. 超大规模并行计算:瓶颈与对策[J]. 信息与电子工程前沿(英文), 2018,19(10):1251-1260.
ZE-YAO MO. Extreme-scale parallel computing: bottlenecks and strategies. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1251-1260.
莫则尧. 超大规模并行计算:瓶颈与对策[J]. 信息与电子工程前沿(英文), 2018,19(10):1251-1260. DOI: 10.1631/FITEE.1800421.
ZE-YAO MO. Extreme-scale parallel computing: bottlenecks and strategies. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1251-1260. DOI: 10.1631/FITEE.1800421.
超大规模数值模拟极大依赖并行计算能力。从计算规模、计算效率和编程生产率3个维度,系统分析了超大规模并行计算能力的主要瓶颈,提出亟待研究的若干关键技术问题和技术对策。本文对推动数值模拟软件计算能力与超级计算机峰值性能的同步提升具有参考价值。
Extreme-scale numerical simulations seriously demand extreme parallel computing capabilities. To address the challenges of these capabilities toward exascale
we systematically analyze the major bottlenecks of parallel computing research from three perspectives: computational scale
computing efficiency
and programming productivity. For these bottlenecks
we propose a series of urgent key issues and coping strategies. This study will be useful in synchronizing development between the numerical computing capability and supercomputer peak performance.
超大规模数值模拟并行计算超级计算机
Extreme scaleNumerical simulationParallel computingSupercomputers
S Amarasinghe, , , M Hall, , , R Lethin, , , 等. . Exascale programming challenges. . Technical Report of the Workshop on Exascale Programming Challenges, , 2011. ..
S Ashby, , , P Beckman, , , J Chen, , , 等. . The opportunities and challenges of exascale computing. . Summary Report of the Advanced Scientific Computing Advisory Committee Subcommittee, , 2011. ..
S Balay, , , WD Gropp, , , LC McInnes, , , 等. . Efficient management of parallelism in object-oriented numerical software libraries. . In: Arge E, Bruaset AM, Langtangen HP (Eds.), Modern Software Tools for Scientific Computing. Birkhauser Boston Inc., Cambridge, USA, , 1997. . DOI:10.1007/978-1-4612-1986-6_8http://doi.org/10.1007/978-1-4612-1986-6_8..
C Campos, , , JE Roman. . Strategies for spectrum slicing based on restarted Lanczos methods. . Numer Algor, , 2012. . 60((2):):279--295. . DOI:10.1007/s11075-012-9564-zhttp://doi.org/10.1007/s11075-012-9564-z..
X Cao, , , Z Mo, , , X Liu, , , 等. . Parallel implementation of fast multipole method based on JASMIN. . Sci China Inform Sci, , 2011. . 54((4):):757--766. . DOI:10.1007/s11432-011-4181-3http://doi.org/10.1007/s11432-011-4181-3..
IH Chung, , , CR Lee, , , J Zhou, , , 等. . Hierarchical mapping for HPC applications. . IEEE Int Symp on Parallel and Distributed Processing Workshops and PhD Forum, , 2011. . p.1815--1823. . DOI:10.1109/IPDPS.2011.340http://doi.org/10.1109/IPDPS.2011.340..
JW Cooley, , , JW Tukey. . An algorithm for the machine calculation of complex Fourier series. . Math Comput, , 1965. . 19((90):):297--301. . DOI:10.1090/S0025-5718-1965-0178586-1http://doi.org/10.1090/S0025-5718-1965-0178586-1..
E Darve. . The fast multipole method: numerical implementation. . J Comput Phys, , 2000. . 160((1):):195--240. . DOI:10.1006/jcph.2000.6451http://doi.org/10.1006/jcph.2000.6451..
V Dolean, , , P Jolivet, , , F Nataf. . An Introduction to Domain Decomposition Methods: Algorithms, Theory, and Parallel Implementation, , ::Philadelphia, USASociety for Industrial and Applied Mathematics, , 2015. . DOI:10.1137/1.9781611974065http://doi.org/10.1137/1.9781611974065..
J Dongarra, , , I Foster, , , G Fox, , , 等. . The Sourcebook of Parallel Computing, , ::San Francisco, USMorgan Kaufmann Publishers Inc., , 2003. ..
A Dubey, , , A Almgren, , , J Bell, , , 等. . A survey of high level frameworks in block-structured adaptive mesh refinement packages. . J Parall Distr Comput, , 2014. . 74((12):):3217--3227. . DOI:10.1016/j.jpdc.2014.07.001http://doi.org/10.1016/j.jpdc.2014.07.001..
N Engheta, , , WD Murphy, , , V Rokhlin, , , 等. . The fast multipole method (FMM) for electromagnetic scattering problems. . IEEE Trans Antenn Propag, , 1992. . 40((6):):634--641. . DOI:10.1109/8.144597http://doi.org/10.1109/8.144597..
RD Falgout, , , UM Yang. . Hypre: a library of high performance pre-conditioners. . Int Conf on Computational Science, , 2002. . p.632--641. . ..
H Fu, , , C He, , , B Chen, , , 等. . 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. . Int Conf for High Performance Computing, Networking, Storage, and Analysis, , 2017. . p.1--12. . DOI:10.1145/3126908.3126910http://doi.org/10.1145/3126908.3126910..
JL Hennessy, , , DA Patterson. . Computer Architecture: a Quantitative Approach, , ::San Francisco, USAMorgan Kaufmann Publishers Inc., , 2003. ..
V Hernandez, , , JE Roman, , , V Vidal. . SLEPc: a scalable and flexible toolkit for the solution of eigenvalue problems. . ACM Trans Math Softw, , 2005. . 31((3):):351--362. . DOI:10.1145/1089014.1089019http://doi.org/10.1145/1089014.1089019..
MA Heroux, , , RA Bartlett, , , VE Howle, , , 等. . An overview of the Trilinos project. . ACM Trans Math Softw, , 2005. . 31((3):):397--423. . DOI:10.1145/1089014.1089021http://doi.org/10.1145/1089014.1089021..
H Johansen, , , LC McInnes, , , DE Bernholdt, , , 等. . Software productivity for extreme-scale science. . DOE Workshop Report, , 2014. ..
DE Keyes, , , LC Mcinnes, , , CS Woodward, , , 等. . Multiphysics simulations: challenges and opportunities. . Int J High Perform Comput Appl, , 2013. . 27((1):):4--83. . DOI:10.1177/1094342012468181http://doi.org/10.1177/1094342012468181..
DA Knoll, , , DE Keyes. . Jacobian-free Newton-Krylov methods: a survey of approaches and applications. . J Comput Phys, , 2004. . 193((2):):357--397. . DOI:10.1016/j.jcp.2003.08.010http://doi.org/10.1016/j.jcp.2003.08.010..
J Li, , , X Zhang, , , G Tan, , , 等. . SMAT: an input adaptive sparse matrix-vector multiplication auto-tuner. . ACM SIGPLAN Not, , 2013. . 48((6):):117--126. . DOI:10.1145/2499370.2462181http://doi.org/10.1145/2499370.2462181..
X Liu, , , Z Yang, , , Y Yang. . A nested partitioning load balancing algorithm for Tianhe-2. . J Comput Res Devel, , 2018. . 55((2):):418--425. . DOI:10.7544/issn1000-1239.2018.20160877http://doi.org/10.7544/issn1000-1239.2018.20160877..
R Lucas, , , J Ang, , , K Bergman, , , 等. . DOE Advanced Scientific Computing Advisory Subcommittee report: top 10 exascale research challenges. . 2014. . DOI:10.2172/1222713http://doi.org/10.2172/1222713..
Z Mo. . Domain-specific programming model for high performance scientific and engineering computation. . Commun CCF, , 2014. . 10((1):):8--12. . ..
Z Mo. . Progress on high performance programming framework for numerical simulation. . E-Sci Technol Appl, , 2015. . 6((4):):11--19. . DOI:10.11871/j.issn.1674-9480.2015.04.002http://doi.org/10.11871/j.issn.1674-9480.2015.04.002..
Z Mo. . High performance programming frameworks for numerical simulation. . Nat Sci Rev, , 2016. . 3((1):):28--29. . DOI:10.1093/nsr/nwv086http://doi.org/10.1093/nsr/nwv086..
Z Mo, , , A Zhang, , , X Cao, , , 等. . JASMIN: a parallel software infrastructure for scientific computing. . Front Comput Sci China, , 2010. . 4((4):):480--488. . DOI:10.1007/s11704-010-0120-5http://doi.org/10.1007/s11704-010-0120-5..
Z Mo, , , A Zhang, , , Q Liu, , , 等. . Research on the components and practices for domain-specific parallel programming models for numerical simulation. . Sci Sin Inform, , 2015. . 45((3):):385--397. . DOI:10.1360/N112013-00197http://doi.org/10.1360/N112013-00197..
Z Mo, , , A Zhang, , , Q Liu, , , 等. . Parallel algorithm and parallel programming: from specialty to generality as well as software reuse. . Sci Sin Inform, , 2016. . 46((10):):1392--1410. . DOI:10.1360/N112016-00144http://doi.org/10.1360/N112016-00144..
W Pei, , , S Zhu. . Scientific computing for laser fusion. . Physics, , 2009. . 38((8):):559--568. . DOI:10.3321/j.issn:0379-4148.2009.08.005http://doi.org/10.3321/j.issn:0379-4148.2009.08.005..
DA Reed, , , R Bajcsy, , , MA Fernandez, , , 等. . Computational science: ensuring America's competitiveness. . Research Report No. ADA462840. President's Information Technology Advisory Committee, , 2005. . http://www.dtic.mil/dtic/tr/fulltext/u2/a462840.pdfhttp://www.dtic.mil/dtic/tr/fulltext/u2/a462840.pdf, , ..
D Rossinelli, , , B Hejazialhosseini, , , P Hadjidoukas, , , 等. . 11 Pflop/s simulations of cloud cavitation collapse. . Int Conf on High Performance Computing, Networking, Storage, and Analysis, , 2013. . p.1--13. . DOI:10.1145/2503210.2504565http://doi.org/10.1145/2503210.2504565..
J Rudi, , , ACI Malossi, , , T Isaac, , , 等. . An extreme-scale implicit solver for complex PDEs: highly heterogeneous flow in Earth's mantle. . Int Conf for High Performance Computing, Networking, Storage, and Analysis, , 2015. . p.1--12. . DOI:10.1145/2807591.2807675http://doi.org/10.1145/2807591.2807675..
T Saad, , , M Darwish. . A high scalability parallel algebraic multigrid solver. . In: Deconinck H, Dick E (Eds.), Computational Fluid Dynamics. Springer Berlin Heidelberg, , 2009. . p.231--236. . DOI:10.1007/978-3-540-92779-2_34http://doi.org/10.1007/978-3-540-92779-2_34..
Y Saad. . Iterative Methods for Sparse Linear Systems (2nd Ed.), , ::Philadelphia, USASociety for Industrial and Applied Mathematics, , 2003. ..
V Sarkar, , , Z Budimlic, , , M Kulkani. . 2014 runtime systems Summit. . Runtime Systems Report, , 2016. . DOI:10.2172/1341724http://doi.org/10.2172/1341724..
DE Shaw, , , JP Grossman, , , JA Bank, , , 等. . Anton 2: raising the bar for performance and programmability in a special-purpose molecular dynamics supercomputer. . Int Conf for High Performance Computing, Networking, Storage, and Analysis, , 2014. . p.41--53. . DOI:10.1109/SC.2014.9http://doi.org/10.1109/SC.2014.9..
R Tian, , , M Zhou, , , J Wang, , , 等. . A challenging dam structural analysis: large-scale implicit thermomechanical coupled contact simulation on Tianhe-2. . Comput Mech, , 2018. . p.1--21. . DOI:10.1007/s00466-018-1586-5http://doi.org/10.1007/s00466-018-1586-5..
R Vuduc, , , JW Demmel, , , KA Yelick. . OSKI: a library of automatically tuned sparse matrix kernels. . J Phys Conf Ser, , 2005. . 16521--530. . DOI:10.1088/1742-6596/16/1/071http://doi.org/10.1088/1742-6596/16/1/071..
AM Wissink, , , RD Hornung, , , SR Kohn, , , 等. . Large scale parallel structured AMR calculations using the SAMRAI framework. . ACM/IEEE Conf on Supercomputing, , 2001. . p.6DOI:10.1145/582034.582040http://doi.org/10.1145/582034.582040..
X Xu, , , Z Mo. . Algebraic interface-based coarsening AMG pre-conditioner for multi-scale sparse matrices with applications to radiation hydrodynamics computation. . Numer Linear Algebra Appl, , 2017. . 24((2):):e2078DOI:10.1002/nla.2078http://doi.org/10.1002/nla.2078..
C Yang, , , W Xue, , , H Fu, , , 等. . 10M-core scalable fullyimplicit solver for non-hydrostatic atmospheric dynamics. . Int Conf for High Performance Computing, Networking, Storage, and Analysis, , 2016. . p.1--12. . DOI:10.1109/SC.2016.5http://doi.org/10.1109/SC.2016.5..
X Yang. . Sixty years of parallel computing. . Comput Eng Sci, , 2012. . 34((8):):1--10. . DOI:10.3969/j.issn.1007-130X.2012.08.001http://doi.org/10.3969/j.issn.1007-130X.2012.08.001..
Z Zhao, , , H Zhou, , , H Ma, , , 等. . Numerical simulation and verification of electromagnetic pulse effect of PIN diode limiter. . High Power Laser Particle Beams, , 2014. . 26((6):):81--85. . DOI:10.11884/HPLPB201426.063018http://doi.org/10.11884/HPLPB201426.063018..
关联资源
相关文章
相关作者
相关机构