后E级时代高性能处理器架构的探索

谢向辉; 贾迅

doi:10.1631/FITEE.1800424

Your Location：

Home >

Browse articles >

后E级时代高性能处理器架构的探索

后E级超算专辑 | Updated：2022-05-19

- 后E级时代高性能处理器架构的探索
  Enhanced Publication
- Exploring high-performance processor architecture beyond the exascale
- 信息与电子工程前沿（英文） 2018年19卷第10期页码：1224-1229
- Affiliations：
  
  State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214125, China
- Author bio：
  
  Xiang-hui XIE, E-mail: xie.xianghui@meac-skl.cn
  [ "Xun JIA, jia.xun@meac-skl.cn" ]
- Funds：
  
  Project supported by the National Natural Science Foundation of China (Nos. 91430214 and 61732018)
- DOI：10.1631/FITEE.1800424
  中图分类号： TP303
- 纸质出版日期：2018-10，
  
  收稿日期：2018-07-11，
  
  修回日期：2018-10-10，
- Accepted：
Scan QR Code
谢向辉, 贾迅. 后E级时代高性能处理器架构的探索[J]. 信息与电子工程前沿（英文）, 2018,19(10):1224-1229.

XIANG-HUI XIE, XUN JIA. Exploring high-performance processor architecture beyond the exascale. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1224-1229.
谢向辉, 贾迅. 后E级时代高性能处理器架构的探索[J]. 信息与电子工程前沿（英文）, 2018,19(10):1224-1229. DOI： 10.1631/FITEE.1800424.

XIANG-HUI XIE, XUN JIA. Exploring high-performance processor architecture beyond the exascale. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1224-1229. DOI： 10.1631/FITEE.1800424.

摘要

科学计算与工程应用对高性能日益增长的需求将推动高性能计算进入后E级时代。高性能处理器作为超级计算系统核心部件，其架构设计对提高系统性能至关重要。首先介绍后E级时代高性能处理器架构设计的3个目标，即性能有效扩展、资源高效利用和适应多种应用。其次，提出标量运算众核主芯片连接应用加速从芯片的Massa处理器架构，通过计算资源分布和应用定制硬件的结合，满足后E级时代高性能处理器架构设计的目标。最后，讨论了Massa架构未来需要重点研究的若干问题。

Abstract

The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system

high-performance processors and their architecture designs are crucial in improving system performance. In this paper

three architecture design goals for high-performance processors beyond the exascale are introduced

including effective performance scaling

efficient resource utilization

and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed

which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally

some future research directions regarding the Massa architecture are discussed.

关键词

高性能计算后E级处理器架构应用定制硬件计算资源分布

Keywords

High-performance computingBeyond the exascaleProcessor architectureApplication-customized hardwareDistributed computational resources

references

H Esmaeilzadeh, , , E Blem, , , RS Amant, , , 等. . Dark silicon and the end of multicore scaling. . 38th Annual Int Symp on Computer Architecture, , 2011. . p.365--376. . DOI:10.1145/2000064.2000108http://doi.org/10.1145/2000064.2000108..

JR Fang, , , HH Fu, , , WL Zhao, , , 等. . swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. . 31st Int Parallel and Distributed Processing Symp, , 2017. . p.615--624. . DOI:10.1109/IPDPS.2017.20http://doi.org/10.1109/IPDPS.2017.20..

HH Fu, , , JF Liao, , , JZ Yang, , , 等. . The Sunway TaihuLight supercomputer: system and applications. . Sci China Inform Sci, , 2016. . 59((7):):1--15. . DOI:10.1007/s11432-016-5588-7http://doi.org/10.1007/s11432-016-5588-7..

HH Fu, , , CH He, , , BW Chen, , , 等. . 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. . 30th Int Conf for High Performance Computing, Networking, Storage and Analysis, , 2017. . p.1--12. . DOI:10.1145/3126908.3126910http://doi.org/10.1145/3126908.3126910..

V Garca-Flores, , , E Ayguade, , , AJ Pea. . Efficient data sharing on heterogeneous systems. . Proc 46th Int Conf on Parallel Processing, , 2017. . p.121--130. . DOI:10.1109/ICPP.2017.21http://doi.org/10.1109/ICPP.2017.21..

S Hemmert. . Green HPC: from nice to necessity. . Comput Sci Eng, , 2016. . 12((6):):8--10. . DOI:10.1109/MCSE.2010.134http://doi.org/10.1109/MCSE.2010.134..

X Jia, , , GM Wu, , , XH Xie. . A high-performance accelerator for floating-point matrix multiplication. . 15th Int Symp on Parallel and Distributed Processing with Applicatons, , 2017. . p.396--402. . DOI:10.1109/ISPA/IUCC.2017.00063http://doi.org/10.1109/ISPA/IUCC.2017.00063..

NP Jouppi, , , C Young, , , N Patil, , , 等. . In-datacenter performance analysis of a tensor processing unit. . 44th Annual Int Symp on Computer Architecture, , 2017. . p.1--12. . DOI:10.1145/3079856.3080246http://doi.org/10.1145/3079856.3080246..

H Lin, , , XC Tang, , , BW Yu, , , 等. . Scalable graph on Sunway TaihuLight with ten million cores. . 31st Int Parallel and Distributed Processing Symp, , 2017. . p.635--645. . DOI:10.1109/IPDPS.2017.53http://doi.org/10.1109/IPDPS.2017.53..

MM Ozdal, , , S Yesil, , , T Kim, , , 等. . Energy efficient architecture for graph analytics accelerators. . 43rd Int Symp on Computer Architecture, , 2016. . p.166--177. . DOI:10.1109/ISCA.2016.24http://doi.org/10.1109/ISCA.2016.24..

A Pedram, , , A Gerstlauer, , , RA van de Geijn. . A high-performance, low-power linear algebra core. . 22nd Int Conf on Application-specific System, Architecture and Processors, , 2011. . p.35--42. . DOI:10.1109/ASAP.2011.6043234http://doi.org/10.1109/ASAP.2011.6043234..

MJ Schulte, , , M Ignatowski, , , GH Loh, , , 等. . Achieving exascale capabilities through heterogeneous computing. . IEEE Micro, , 2015. . 35((4):):26--36. . DOI:10.1109/MM.2015.71http://doi.org/10.1109/MM.2015.71..

JM Shalf, , , R Leland. . Computing beyond Moore's law. . Computer, , 2015. . 48((12):):14--23. . DOI:10.1109/MC.2015.374http://doi.org/10.1109/MC.2015.374..

M Silbertstein. . OmniX: an accelerator-centric OS for omni-programmable systems. . 16th Workshop on Hot Topics in Operating Systems, , 2017. . p.69--75. . DOI:10.1145/3102980.3102992http://doi.org/10.1145/3102980.3102992..

RS Williams. . What's next? [The end of Moore's law]. . Comput Sci Eng, , 2017. . 19((2):):7--13. . DOI:10.1109/MCSE.2017.31http://doi.org/10.1109/MCSE.2017.31..

ZG Xu, , , J Lin, , , S Matsuoka. . Benchmarking SW26010 many-core processor. . 31st Int Conf on Parallel and Distributed Processing Symp Workshops, , 2017. . p.743--752. . DOI:10.1109/IPDPSW.2017.9http://doi.org/10.1109/IPDPSW.2017.9..

C Yang, , , W Xue, , , HH Fu, , , 等. . 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. . 29th Int Conf for High Performance Computing, Networking, Storage and Analysis, , 2016. . p.57--68. . DOI:10.1109/SC.2016.5http://doi.org/10.1109/SC.2016.5..

B Zhao, , , W Gao, , , RC Zhao, , , 等. . Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. . 1st Int Conf on Big Data Computing and Communications, , 2015. . p.257--272. . DOI:10.1007/978-3-319-22047-5_21http://doi.org/10.1007/978-3-319-22047-5_21..

F Zheng, , , K Zhang, , , GM Wu, , , 等. . Architecture techniques of many-core processor for energy-efficient in high performance computing. . Chin J Comput, , 2014. . 37((10):):2176--2186. . DOI:10.3724/SP.J.1016.2014.02176http://doi.org/10.3724/SP.J.1016.2014.02176..

F Zheng, , , HL Li, , , H Lv, , , 等. . Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. . J Comput Sci Technol, , 2015. . 30((1):):145--162. . DOI:10.1007/s11390-015-1510-9http://doi.org/10.1007/s11390-015-1510-9..

浏览量

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

OHTMA: an optimized heuristic topology-aware mapping algorithm on the Tianhe-3 exascale supercomputer prototype

FTRP: a new fault tolerance framework using process replication and prefetching for high-performance computing

Moving from exascale to zettascale computing: challenges and techniques