
FOLLOWUS
State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214125, China
Xiang-hui XIE, E-mail: xie.xianghui@meac-skl.cn
[ "Xun JIA, jia.xun@meac-skl.cn" ]
纸质出版日期:2018-10,
收稿日期:2018-07-11,
修回日期:2018-10-10,
Scan QR Code
谢向辉, 贾迅. 后E级时代高性能处理器架构的探索[J]. 信息与电子工程前沿(英文), 2018,19(10):1224-1229.
XIANG-HUI XIE, XUN JIA. Exploring high-performance processor architecture beyond the exascale. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1224-1229.
谢向辉, 贾迅. 后E级时代高性能处理器架构的探索[J]. 信息与电子工程前沿(英文), 2018,19(10):1224-1229. DOI: 10.1631/FITEE.1800424.
XIANG-HUI XIE, XUN JIA. Exploring high-performance processor architecture beyond the exascale. [J]. Frontiers of information technology & electronic engineering, 2018, 19(10): 1224-1229. DOI: 10.1631/FITEE.1800424.
科学计算与工程应用对高性能日益增长的需求将推动高性能计算进入后E级时代。高性能处理器作为超级计算系统核心部件,其架构设计对提高系统性能至关重要。首先介绍后E级时代高性能处理器架构设计的3个目标,即性能有效扩展、资源高效利用和适应多种应用。其次,提出标量运算众核主芯片连接应用加速从芯片的Massa处理器架构,通过计算资源分布和应用定制硬件的结合,满足后E级时代高性能处理器架构设计的目标。最后,讨论了Massa架构未来需要重点研究的若干问题。
The ever-increasing need for high performance in scientific computation and engineering applications will push high-performance computing beyond the exascale. As an integral part of a supercomputing system
high-performance processors and their architecture designs are crucial in improving system performance. In this paper
three architecture design goals for high-performance processors beyond the exascale are introduced
including effective performance scaling
efficient resource utilization
and adaptation to diverse applications. Then a high-performance many-core processor architecture with scalar processing and application-specific acceleration (Massa) is proposed
which aims to achieve the above three goals by employing the techniques of distributed computational resources and application-customized hardware. Finally
some future research directions regarding the Massa architecture are discussed.
高性能计算后E级处理器架构应用定制硬件计算资源分布
High-performance computingBeyond the exascaleProcessor architectureApplication-customized hardwareDistributed computational resources
H Esmaeilzadeh, , , E Blem, , , RS Amant, , , 等. . Dark silicon and the end of multicore scaling. . 38th Annual Int Symp on Computer Architecture, , 2011. . p.365--376. . DOI:10.1145/2000064.2000108http://doi.org/10.1145/2000064.2000108..
JR Fang, , , HH Fu, , , WL Zhao, , , 等. . swDNN: a library for accelerating deep learning applications on Sunway TaihuLight. . 31st Int Parallel and Distributed Processing Symp, , 2017. . p.615--624. . DOI:10.1109/IPDPS.2017.20http://doi.org/10.1109/IPDPS.2017.20..
HH Fu, , , JF Liao, , , JZ Yang, , , 等. . The Sunway TaihuLight supercomputer: system and applications. . Sci China Inform Sci, , 2016. . 59((7):):1--15. . DOI:10.1007/s11432-016-5588-7http://doi.org/10.1007/s11432-016-5588-7..
HH Fu, , , CH He, , , BW Chen, , , 等. . 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios. . 30th Int Conf for High Performance Computing, Networking, Storage and Analysis, , 2017. . p.1--12. . DOI:10.1145/3126908.3126910http://doi.org/10.1145/3126908.3126910..
V Garca-Flores, , , E Ayguade, , , AJ Pea. . Efficient data sharing on heterogeneous systems. . Proc 46th Int Conf on Parallel Processing, , 2017. . p.121--130. . DOI:10.1109/ICPP.2017.21http://doi.org/10.1109/ICPP.2017.21..
S Hemmert. . Green HPC: from nice to necessity. . Comput Sci Eng, , 2016. . 12((6):):8--10. . DOI:10.1109/MCSE.2010.134http://doi.org/10.1109/MCSE.2010.134..
X Jia, , , GM Wu, , , XH Xie. . A high-performance accelerator for floating-point matrix multiplication. . 15th Int Symp on Parallel and Distributed Processing with Applicatons, , 2017. . p.396--402. . DOI:10.1109/ISPA/IUCC.2017.00063http://doi.org/10.1109/ISPA/IUCC.2017.00063..
NP Jouppi, , , C Young, , , N Patil, , , 等. . In-datacenter performance analysis of a tensor processing unit. . 44th Annual Int Symp on Computer Architecture, , 2017. . p.1--12. . DOI:10.1145/3079856.3080246http://doi.org/10.1145/3079856.3080246..
H Lin, , , XC Tang, , , BW Yu, , , 等. . Scalable graph on Sunway TaihuLight with ten million cores. . 31st Int Parallel and Distributed Processing Symp, , 2017. . p.635--645. . DOI:10.1109/IPDPS.2017.53http://doi.org/10.1109/IPDPS.2017.53..
MM Ozdal, , , S Yesil, , , T Kim, , , 等. . Energy efficient architecture for graph analytics accelerators. . 43rd Int Symp on Computer Architecture, , 2016. . p.166--177. . DOI:10.1109/ISCA.2016.24http://doi.org/10.1109/ISCA.2016.24..
A Pedram, , , A Gerstlauer, , , RA van de Geijn. . A high-performance, low-power linear algebra core. . 22nd Int Conf on Application-specific System, Architecture and Processors, , 2011. . p.35--42. . DOI:10.1109/ASAP.2011.6043234http://doi.org/10.1109/ASAP.2011.6043234..
MJ Schulte, , , M Ignatowski, , , GH Loh, , , 等. . Achieving exascale capabilities through heterogeneous computing. . IEEE Micro, , 2015. . 35((4):):26--36. . DOI:10.1109/MM.2015.71http://doi.org/10.1109/MM.2015.71..
JM Shalf, , , R Leland. . Computing beyond Moore's law. . Computer, , 2015. . 48((12):):14--23. . DOI:10.1109/MC.2015.374http://doi.org/10.1109/MC.2015.374..
M Silbertstein. . OmniX: an accelerator-centric OS for omni-programmable systems. . 16th Workshop on Hot Topics in Operating Systems, , 2017. . p.69--75. . DOI:10.1145/3102980.3102992http://doi.org/10.1145/3102980.3102992..
RS Williams. . What's next? [The end of Moore's law]. . Comput Sci Eng, , 2017. . 19((2):):7--13. . DOI:10.1109/MCSE.2017.31http://doi.org/10.1109/MCSE.2017.31..
ZG Xu, , , J Lin, , , S Matsuoka. . Benchmarking SW26010 many-core processor. . 31st Int Conf on Parallel and Distributed Processing Symp Workshops, , 2017. . p.743--752. . DOI:10.1109/IPDPSW.2017.9http://doi.org/10.1109/IPDPSW.2017.9..
C Yang, , , W Xue, , , HH Fu, , , 等. . 10m-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. . 29th Int Conf for High Performance Computing, Networking, Storage and Analysis, , 2016. . p.57--68. . DOI:10.1109/SC.2016.5http://doi.org/10.1109/SC.2016.5..
B Zhao, , , W Gao, , , RC Zhao, , , 等. . Performance evaluation of NPB and SPEC CPU2006 on various SIMD extensions. . 1st Int Conf on Big Data Computing and Communications, , 2015. . p.257--272. . DOI:10.1007/978-3-319-22047-5_21http://doi.org/10.1007/978-3-319-22047-5_21..
F Zheng, , , K Zhang, , , GM Wu, , , 等. . Architecture techniques of many-core processor for energy-efficient in high performance computing. . Chin J Comput, , 2014. . 37((10):):2176--2186. . DOI:10.3724/SP.J.1016.2014.02176http://doi.org/10.3724/SP.J.1016.2014.02176..
F Zheng, , , HL Li, , , H Lv, , , 等. . Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture. . J Comput Sci Technol, , 2015. . 30((1):):145--162. . DOI:10.1007/s11390-015-1510-9http://doi.org/10.1007/s11390-015-1510-9..
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621