Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator

Yulong ZHAO; Chunzhi WU; Yizhuo WANG; Lufei ZHANG; Yaguang ZHANG; Wenyuan SHEN; Hao FAN; Hankang FANG; Yi QIN; Xin LIU

doi:10.1631/FITEE.2400453

Your Location：

Home >

Browse articles >

Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator

Regular Papers | Updated：2025-05-06

- Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator
- 使用申威人工智能加速器的控制单元最小化Transformer推理开销
- “In the field of natural language processing, this study introduces its research progress in optimizing transformer models. Expert xx developed a three-tier scheduling framework and zero-copy memory management technique, which significantly reduces inference overhead and enhances the efficiency of transformer models on AI accelerators.”
- Frontiers of Information Technology & Electronic Engineering Vol. 26, Issue 4, Pages: 605-622(2025)
- Affiliations：
  
  1.State Key Laboratory of Mathematical Engineering and Advanced Computing, Wuxi 214000, China
  2.School of Non-Commissioned Officer, Space Engineering University, Beijing 100004, China
  3.National Supercomputing Center in Wuxi, Wuxi 214000, China
  4.Zhejiang Lab, Hangzhou 310000, China
  5.National Research Centre of Parallel Computer Engineering and Technology, Beijing 100081, China
- Author bio：
  
  E-mail: zhaoyl04@163.com
  ‡Corresponding author
- Funds：
- DOI：10.1631/FITEE.2400453
  CLC： TP181
- Received：28 May 2024，
  
  Revised：25 August 2024，
  
  Published：2025-04
- Accepted：
Scan QR Code
Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622.
DOI：

Yulong ZHAO, Chunzhi WU, Yizhuo WANG, et al. Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator[J]. Frontiers of information technology & electronic engineering, 2025, 26(4): 605-622. DOI： 10.1631/FITEE.2400453.

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

Intro

Map

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰

Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator

DOI：10.1631/FITEE.2400453