FOLLOWUS
1.School of Computer, National University of Defense Technology, Changsha 410073, China
2.Academy of Military Science, Beijing 100091, China
‡Corresponding author
Published:2022-12,
Received:18 May 2022,
Revised:11 October 2022,
Scan QR Code
JINSHU SU, BAOKANG ZHAO, YI DAI, et al. Technology trends in large-scale high-efficiency network computing. [J]. Frontiers of information technology & electronic engineering, 2022, 23(12): 1733-1746.
JINSHU SU, BAOKANG ZHAO, YI DAI, et al. Technology trends in large-scale high-efficiency network computing. [J]. Frontiers of information technology & electronic engineering, 2022, 23(12): 1733-1746. DOI: 10.1631/FITEE.2200217.
网络技术是超级计算、云计算、大数据和人工智能等大规模高效计算的基础。不同领域的网络技术既互相借鉴,又各自针对性设计和优化。综合考虑,本文认为大规模高效网络计算中的网络技术发展趋势主要包括3个方面,即融合、分化、优化。融合体现在不同领域的网络技术没有明显分界线;分化体现在不同领域的独特解决方案或者新应用需求下的创新方案;优化体现在针对特定场景的技术优化实现。本文将为相关领域的学者提供对于未来研究方向的思考,也为相关行业人员构建更加实用高效的网络系统提供方向。
Network technology is the basis for large-scale high-efficiency network computing
such as supercomputing
cloud computing
big data processing
and artificial intelligence computing. The network technologies of network computing systems in different fields not only learn from each other but also have targeted design and optimization. Considering it comprehensively
three development trends
i.e.
integration
differentiation
and optimization
are summarized in this paper for network technologies in different fields. Integration reflects that there are no clear boundaries for network technologies in different fields
differentiation reflects that there are some unique solutions in different application fields or innovative solutions under new application requirements
and optimization reflects that there are some optimizations for specific scenarios. This paper can help academic researchers consider what should be done in the future and industry personnel consider how to build efficient practical network systems.
超级计算云计算网络技术发展趋势
SupercomputingCloud computingNetwork technologyDevelopment trends
Ajima Y, Inoue T, Hiramoto S, et al., 2014. Tofu Interconnect 2: system-on-chip integration of high-performance interconnect. Proc 29th Int Conf on Supercomputing, p.498-507. https://doi.org/10.1007/978-3-319-07518-1_35https://doi.org/10.1007/978-3-319-07518-1_35
Bayatpour M, Sarkauskas N, Subramoni H, et al., 2021. BluesMPI: efficient MPI non-blocking alltoall offloading designs on modern BlueField smart NICs. Proc 36th Int Conf on High Performance Computing, p.18-37. https://doi.org/10.1007/978-3-030-78713-4_2https://doi.org/10.1007/978-3-030-78713-4_2
Bishop M, 2021. Hypertext Transfer Protocol Version 3 (HTTP/3). Internet-Draft draft-ietf-quic-http-34. Internet Engineering Task Force.
Boden NJ, Cohen D, Felderman RE, et al., 1995. Myrinet: a gigabit-per-second local area network. IEEE Micro, 15(1):29-36. https://doi.org/10.1109/40.342015https://doi.org/10.1109/40.342015
Coteus P, Bickford HR, Cipolla TM, et al., 2005. Packaging the Blue Gene/L supercomputer. IBM J Res Dev, 49(2-3):213-248. https://doi.org/10.1147/rd.492.0213https://doi.org/10.1147/rd.492.0213
Dai Y, Lu K, Xiao LQ, et al., 2019. A cost-efficient router architecture for HPC inter-connection networks: design and implementation. IEEE Trans Parall Distrib Syst, 30(4):738-753. https://doi.org/10.1109/TPDS.2018.2873337https://doi.org/10.1109/TPDS.2018.2873337
Dang HT, Canini M, Pedone F, et al., 2016. Paxos made switch-y. ACM SIGCOMM Comput Commun Rev, 46(2):18-24. https://doi.org/10.1145/2935634.2935638https://doi.org/10.1145/2935634.2935638
Dang HT, Bressana P, Wang H, et al., 2020. P4xos: consensus as a network service. IEEE/ACM Trans Netw, 28(4):1726-1738. https://doi.org/10.1109/TNET.2020.2992106https://doi.org/10.1109/TNET.2020.2992106
de Coninck Q, Bonaventure O, 2017. Multipath QUIC: design and evaluation. Proc 13th Int Conf on Emerging Networking Experiments and Technologies, p.160-166. https://doi.org/10.1145/3143361.3143370https://doi.org/10.1145/3143361.3143370
de Coninck Q, Bonaventure O, 2021. Multiflow QUIC: a generic multipath transport protocol. IEEE Commun Mag, 59(5):108-113. https://doi.org/10.1109/MCOM.001.2000892https://doi.org/10.1109/MCOM.001.2000892
Derradji S, Palfer-Sollier T, Panziera JP, et al., 2015. The BXI interconnect architecture. Proc IEEE 23rd Annual Symp on High-Performance Interconnects, p.18-25. https://doi.org/10.1109/HOTI.2015.15https://doi.org/10.1109/HOTI.2015.15
de Sensi D, di Girolamo S, McMahon KH, et al., 2020. An in-depth analysis of the slingshot interconnect. Proc Int Conf for High Performance Computing, Networking, Storage and Analysis, p.1-14. https://doi.org/10.1109/SC41405.2020.00039https://doi.org/10.1109/SC41405.2020.00039
Ferlin S, Alay Ö, Mehani O, et al., 2016. BLEST: blocking estimation-based MPTCP scheduler for heterogeneous networks. Proc IFIP Networking Conf and Workshops, p.431-439. https://doi.org/10.1109/IFIPNetworking.2016.7497206https://doi.org/10.1109/IFIPNetworking.2016.7497206
Ford A, Raiciu C, Handley M, et al., 2020. TCP Extensions for Multipath Operation with Multiple Addresses. RFC8684. Internet Engineering Task Force.
Gibson D, Hariharan H, Lance E, et al., 2022. Aquila: a unified, low-latency fabric for datacenter networks. Proc 19th USENIX Symp on Networked Systems Design and Implementation, p.1249-1266.
Guo CX, Wu HT, Deng Z, et al., 2016. RDMA over commodity Ethernet at scale. Proc ACM SIGCOMM Conf, p.202-215. https://doi.org/10.1145/2934872.2934908https://doi.org/10.1145/2934872.2934908
InfiniBand Trade Association, 2010. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 annex A16: RDMA over Converged Ethernet (RoCE).
InfiniBand Trade Association, 2014. Supplement to InfiniBand Architecture Specification Volume 1 Release 1.2.2 annex A17: RoCEv2 (IP Routable RoCE).
Iyengar J, Thomson M, 2021. QUIC: a UDP-Based Multiplexed and Secure Transport. RFC9000. Internet Engineering Task Force.
Jain A, Alnaasan N, Shafi A, et al., 2021. Accelerating CPU-based distributed DNN training on modern HPC clusters using BlueField-2 DPUs. Proc IEEE Symp on High-Performance Interconnects, p.17-24. https://doi.org/10.1109/HOTI52880.2021.00017https://doi.org/10.1109/HOTI52880.2021.00017
Ji XS, Wu JX, Jin L, et al., 2022. Discussion on a new paradigm of endogenous security towards 6G networks. Front Inform Technol Electron Eng, 23(10):1421-1450. https://doi.org/10.1631/FITEE.2200060https://doi.org/10.1631/FITEE.2200060
Jin X, Li XZ, Zhang HY, et al., 2017. NetCache: balancing key-value stores with fast in-network caching. Proc 26th Symp on Operating Systems Principles, p.121-136. https://doi.org/10.1145/3132747.3132764https://doi.org/10.1145/3132747.3132764
Jonglez B, Heusse M, Gaujal B, et al., 2020. SRPT-ECF: challenging Round-Robin for stream-aware multipath scheduling. Proc IFIP Networking Conf, p.719-724.
Kim J, Dally WJ, Towles B, et al., 2005. Microarchitecture of a high radix router. Proc 32nd Int Symp on Computer Architecture, p.420-431. https://doi.org/10.1109/ISCA.2005.35https://doi.org/10.1109/ISCA.2005.35
Langley A, Riddoch A, Wilk A, et al., 2017. The QUIC transport protocol: design and Internet-scale deployment. Proc Conf of the ACM Special Interest Group on Data Communication, p.183-196. https://doi.org/10.1145/3098822.3098842https://doi.org/10.1145/3098822.3098842
Li BJ, Ruan ZY, Xiao WC, et al., 2017. KV-Direct: high-performance in-memory key-value store with programmable NIC. Proc 26th Symp on Operating Systems Principles, p.137-152. https://doi.org/10.1145/3132747.3132756https://doi.org/10.1145/3132747.3132756
Li YJ, Liu IJ, Yuan YF, et al., 2019. Accelerating distributed reinforcement learning with in-switch computing. Proc ACM/IEEE 46th Annual Int Symp on Computer Architecture, p.279-291.
Liao XK, Pang ZB, Wang KF, et al., 2015. High performance interconnect network for Tianhe system. J Comput Sci Technol, 30(2):259-272. https://doi.org/10.1007/s11390-015-1520-7https://doi.org/10.1007/s11390-015-1520-7
Lim YS, Nahum EM, Towsley D, et al., 2017. ECF: an MPTCP path scheduler to manage heterogeneous paths. Proc 13th Int Conf on Emerging Networking Experiments and Technologies, p.147-159. https://doi.org/10.1145/3143361.3143376https://doi.org/10.1145/3143361.3143376
Liu Y, Ma Y, Huitema C, et al., 2020. Multipath Extension for QUIC. Internet-Draft: draft-liu-multipath-quic-04. Internet Engineering Task Force.
Liu Y, Ma Y, de Coninck Q, et al., 2022. Multipath Extension for QUIC. Internet-Draft: draft-ietf-quic-multipath-01. Internet Engineering Task Force.
Petrini F, Feng WC, Hoisie A, et al., 2002. The Quadrics network: high-performance clustering technology. IEEE Micro, 22(1):46-57. https://doi.org/10.1109/40.988689https://doi.org/10.1109/40.988689
Shi X, Wang L, Zhang F, et al., 2020. PStream: priority-based stream scheduling for heterogeneous paths in multipath-QUIC. Proc 29th Int Conf on Computer Communications and Networks, p.1-8. https://doi.org/10.1109/ICCCN49398.2020.9209682https://doi.org/10.1109/ICCCN49398.2020.9209682
Song QC, 2019. Mellanox In-Network Computing for AI and the Development with NVIDIA (SHARP-NCCL). Mellanox.
Wang XF, Shi XQ, Su JS, 2008. A TOE-based approach to zero-copy data transmission. Comput Eng Sci, 30(2):135-138(in Chinese).
Wu JX, 2022. Revolution of the development paradigm of network technology system—network of networks. Telecommun Sci, 38(6):3-12(in Chinese). https://doi.org/10.11959/j.issn.1000-0801.2022140https://doi.org/10.11959/j.issn.1000-0801.2022140
Zheng ZL, Ma YF, Liu YM, et al., 2021. XLINK: QoE-driven multi-path QUIC transport in large-scale video services. Proc ACM SIGCOMM Conf, p.418-432. https://doi.org/10.1145/3452296.3472893https://doi.org/10.1145/3452296.3472893
Zhu YB, Eran H, Firestone D, et al., 2015. Congestion control for large-scale RDMA deployments. ACM SIGCOMM Comput Commun Rev, 45(4):523-536. https://doi.org/10.1145/2829988.2787484https://doi.org/10.1145/2829988.2787484
Publicity Resources
Related Articles
Related Author
Related Institution