Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator
Regular Papers|Updated:2025-05-06
|
Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator
使用申威人工智能加速器的控制单元最小化Transformer推理开销
“In the field of natural language processing, this study introduces its research progress in optimizing transformer models. Expert xx developed a three-tier scheduling framework and zero-copy memory management technique, which significantly reduces inference overhead and enhances the efficiency of transformer models on AI accelerators.”
Frontiers of Information Technology & Electronic EngineeringVol. 26, Issue 4, Pages: 605-622(2025)
Affiliations:
1.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214000, China
2.School of Non-Commissioned Officer, Space Engineering University, Beijing 100004, China
3.National Supercomputing Center in Wuxi, Wuxi 214000, China
4.Zhejiang Lab, Hangzhou 310000, China
5.National Research Centre of Parallel Computer Engineering and Technology, Beijing 100081, China
Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622.
DOI:
Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622. DOI: 10.1631/FITEE.2400453.
Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator