Your Location:
Home >
Browse articles >
FastCheck: fast checkpointing and recovery for DNN training via parallel transmission and compression
Regular Papers | Updated:2026-02-11
    • FastCheck: fast checkpointing and recovery for DNN training via parallel transmission and compression

      Enhanced Publication
    • FastCheck:一种基于并行传输与定制化压缩的深度神经网络训练检查点快速保存与恢复方法
    • Training large-scale deep neural networks (DNNs) is prone to software and hardware failures, with critical failures often requiring full-machine reboots that substantially prolong training. In this paper, we propose FastCheck, a checkpoint–recovery framework that accelerates checkpointing and recovery through parallel transmission and tailored compression.
    • ENGINEERING Information Technology & Electronic Engineering   Vol. 27, Issue 2, Pages: 1-13(2026)
    • DOI:10.1631/ENG.ITEE.2025.0034    

      CLC: TP302
    • Received:12 September 2025

      Revised:2026-01-23

      Published:2026-02

    Scan QR Code

  • Yun TENG, Dawei SUN, Shipeng HU, et al. FastCheck: fast checkpointing and recovery for DNN training via parallel transmission and compression[J]. ENGINEERING Information Technology & Electronic Engineering, 2026, 27(2): 1-13. DOI: 10.1631/ENG.ITEE.2025.0034.

  •  
  •  

0

Views

2

Downloads

0

CSCD

>
Alert me when the article has been cited
Submit
Tools
Download
Export Citation
Share
Add to favorites
Add to my album

Related Articles

Efficient controller area network data compression for automobile applications*#

Related Author

Wu Yu-jing
Chung Jin-Gyun

Related Institution

Division of Electronics & Information Engineering, Chonbuk National University
0