Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator
Regular Papers|Updated:2025-05-06
|
Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator
使用申威人工智能加速器的控制单元最小化Transformer推理开销
“In the field of natural language processing, this study addresses the computational overhead challenge in transformer models. Expert researchers developed a three-tier scheduling framework and zero-copy memory management technique, laying a foundation for optimizing transformer models and improving inference efficiency on AI accelerators.”
Frontiers of Information Technology & Electronic EngineeringVol. 26, Issue 4, Pages: 605-622(2025)
Affiliations:
1.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214000, China
2.School of Non-Commissioned Officer, Space Engineering University, Beijing 100004, China
3.National Supercomputing Center in Wuxi, Wuxi 214000, China
4.Zhejiang Lab, Hangzhou 310000, China
5.National Research Centre of Parallel Computer Engineering and Technology, Beijing 100081, China
Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622.
DOI:
Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622. DOI: 10.1631/FITEE.2400453.
Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator