

FOLLOWUS
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
University of Chinese Academy of Sciences, Beijing 100049, China
CHENG Jian, E-mail: jcheng@nlpr.ia.ac.cn
[ "WANG Pei-song, E-mail:peisong.wang@nlpr.ia.ac.cn" ]
[ "LI Gang, E-mail: gang.li@nlpr.ia.ac.cn" ]
[ "HU Qing-hao, E-mail: qinghao.hu@nlpr.ia.ac.cn" ]
Received:25 November 2017,
Revised:2018-;1-26,
Published:2018-01
Scan QR Code
Jian CHENG, Pei-song WANG, Gang LI, et al. Recent advances in efficient computation of deep convolutional neural networks[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(1): 64-77.
Jian CHENG, Pei-song WANG, Gang LI, et al. Recent advances in efficient computation of deep convolutional neural networks[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(1): 64-77. DOI: 10.1631/FITEE.1700789.
近年来迅速发展的深度神经网络已成为许多智能系统的基础工具。同时,深度网络的计算复杂度和资源消耗也在持续增加,这给深度网络的部署带来了严峻挑战,尤其在实时应用中或应用设备资源有限时。因此,网络加速是深度学习领域的热门话题。为提升深度神经网络的硬件性能,最近几年涌现出一大批基于现场可编程门阵列(field-programmable gate array
FPGA)或专用集成电路(application-specific integrated circuit
ASIC)的加速器。本文针对网络加速、压缩、软硬件结合的加速器设计等方面的进展进行了详细而全面的总结。特别地,本文对网络剪枝、低秩估计、网络量化、拟合网络、紧凑网络设计以及硬件加速器进行了深入分析。最后,展望了该领域未来一些研究方向。
Deep neural networks have evolved remarkably over the past few years and they are currently the fundamental tools of many intelligent systems. At the same time
the computational complexity and resource consumption of these networks continue to increase. This poses a significant challenge to the deployment of such networks
especially in real-time applications or on resource-limited devices. Thus
network acceleration has become a hot topic within the deep learning community. As for hardware implementation of deep neural networks
a batch of accelerators based on a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC) have been proposed in recent years. In this paper
we provide a comprehensive survey of recent advances in network acceleration
compression
and accelerator design from both algorithm and hardware points of view. Specifically
we provide a thorough analysis of each of the following topics: network pruning
low-rank approximation
network quantization
teacherɃstudent networks
compact network design
and hardware accelerators. Finally
we introduce and discuss a few possible future directions.
J Albericio , , , P Judd , , , T Hetherington , , , 等 . . Cnvlutin: ineffectual-neuron-free deep neural network computing . . Proc 43 rd Int Symp on Computer Architecture , , 2016 . . p.1 - - 13 . . DOI: 10.1145/3007787.3001138 http://doi.org/10.1145/3007787.3001138 . .
M Alwani , , , H Chen , , , M Ferdman , , , 等 . . Fused-layer CNN accelerators . . 49 th Annual IEEE/ACM Int Symp on MICRO , , 2016 . . p.1 - - 12 . . DOI: 10.1109/MICRO.2016.7783725 http://doi.org/10.1109/MICRO.2016.7783725 . .
S Anwar , , , K Hwang , , , W Sung . . Structured pruning of deep convolutional neural networks . . ACM J Emerg Technol Comput Syst , , 2017 . . 13 ( ( 3 ): ): Article 32 DOI: 10.1145/3005348 http://doi.org/10.1145/3005348 . .
Z Cai , , , X He , , , J Sun , , , 等 . . Deep learning with low precision by half-wave Gaussian quantization . . IEEE Computer Society Conf on Computer Vision and Pattern Recognition , , 2017 . . p.5918 - - 5926 . . . .
L Chen , , , J Li , , , Y Chen , , , 等 . . Accelerator-friendly neural-network training: learning variations and defects in RRAM crossbar . . Proc Conf on Design, Automation and Test in Europe Conf and Exhibition , , 2017 . . p.19 - - 24 . . . .
Y Chen , , , N Sun , , , O Temam , , , 等 . . DaDianNao: a machine-learning supercomputer . . Proc 47 th Annual IEEE/ACM Int Symp on Microarchitecture , , 2014 . . p.609 - - 622 . . DOI: 10.1109/MICRO.2014.58 http://doi.org/10.1109/MICRO.2014.58 . .
Y Chen , , , T Krishna , , , J Emer , , , 等 . . Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks . . IEEE J Sol-Stat Circ , , 2017 . . 52 ( ( 1 ): ): 127 - - 138 . . DOI: 10.1109/JSSC.2016.2616357 http://doi.org/10.1109/JSSC.2016.2616357 . .
J Cheng , , , J Wu , , , C Leng , , , 等 . . Quantized CNN: a unified approach to accelerate and compress convolutional networks . . IEEE Trans Neur Netw Learn Syst , , 2017 . . 99 1 - - 14 . . DOI: 10.1109/TNNLS.2017.2774288 http://doi.org/10.1109/TNNLS.2017.2774288 . .
Z Cheng , , , D Soudry , , , Z Mao , , , 等 . . Training binary multilayer neural networks for image classification using expectation backpropagation . . http://arxiv.org/abs/1503.03562 http://arxiv.org/abs/1503.03562 , , 2015 . . .
M Courbariaux , , , Y Bengio , , , J David . . Binaryconnect: training deep neural networks with binary weights during propagations . . NIPS , , 2015 . . p.3123 - - 3131 . . . .
M Denil , , , B Shakibi , , , L Dinh , , , 等 . . Predicting parameters in deep learning . . NIPS , , 2013 . . p.2148 - - 2156 . . . .
T Dettmers . . 8-bit approximations for parallelism in deep learning . . http://arxiv.org/abs/1511.04561 http://arxiv.org/abs/1511.04561 , , 2015 . . .
M Gao , , , J Pu , , , X Yang , , , 等 . . TETRIS: scalable and efficient neural network acceleration with 3D memory . . Proc 22 nd Int Conf on Architectural Support for Programming Languages and Operating Systems , , 2017 . . p.751 - - 764 . . DOI: 10.1145/3093337.3037702 http://doi.org/10.1145/3093337.3037702 . .
Y Gong , , , L Liu , , , M Yang , , , 等 . . Compressing deep convolutional networks using vector quantization . . http://arxiv.org/abs/1412.6115 http://arxiv.org/abs/1412.6115 , , 2014 . . .
D Gudovskiy , , , L Rigazio . . Shift CNN: generalized low-precision architecture for inference of convolutional neural networks . . http://arxiv.org/abs/1706.02393 http://arxiv.org/abs/1706.02393 , , 2017 . . .
Y Guo , , , A Yao , , , Y Chen . . Dynamic network surgery for efficient DNNs . . NIPS , , 2016 . . p.1379 - - 1387 . . . .
S Gupta , , , A Agrawal , , , K Gopalakrishnan , , , 等 . . Deep learning with limited numerical precision . . Proc 32 nd Int Conf on Machine Learning , , 2015 . . p.1737 - - 1746 . . . .
D Hammerstrom . . A VLSI architecture for high-performance, low-cost, on-chip learning . . IJCNN Int Joint Conf on Neural Networks , , 2012 . . p.537 - - 544 . . DOI: 10.1109/IJCNN.1990.137621 http://doi.org/10.1109/IJCNN.1990.137621 . .
S Han , , , H Mao , , , W Dally . . Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding . . http://arxiv.org/abs/1510.00149 http://arxiv.org/abs/1510.00149 , , 2015a . . .
S Han , , , J Pool , , , J Tran , , , 等 . . Learning both weights and connections for efficient neural network . . NIPS , , 2015b . . p.1135 - - 1143 . . . .
S Han , , , X Liu , , , H Mao , , , 等 . . EIE: efficient inference engine on compressed deep neural network . . ACM/IEEE 43 rd Annual Int Symp on Computer Architecture , , 2016 . . p.243 - - 254 . . DOI: 10.1109/ISCA.2016.30 http://doi.org/10.1109/ISCA.2016.30 . .
S Han , , , J Kang , , , H Mao , , , 等 . . ESE: efficient speech recognition engine with sparse LSTM on FPGA . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2017 . . p.75 - - 84 . . DOI: 10.1145/3020078.3021745 http://doi.org/10.1145/3020078.3021745 . .
B Hassibi , , , D Stork . . Second order derivatives for network pruning: optimal brain surgeon . . NIPS , , 1993 . . p.164 - - 171 . . . .
K He , , , X Zhang , , , S Ren , , , 等 . . Deep residual learning for image recognition . . Proc IEEE Conf on Computer Vision and Pattern Recognition , , 2016 . . p.770 - - 778 . . . .
Y He , , , X Zhang , , , J Sun . . Channel pruning for accelerating very deep neural networks . . http://arxiv.org/abs/1707.06168 http://arxiv.org/abs/1707.06168 , , 2017 . . .
G Hinton , , , O Vinyals , , , J Dean . . Distilling the knowledge in a neural network . . http://arxiv.org/abs/1503.02531 http://arxiv.org/abs/1503.02531 , , 2015 . . .
J Holi , , , J Hwang . . Finite precision error analysis of neural network hardware implementations . . IEEE Trans Comput , , 1993 . . 42 ( ( 3 ): ): 281 - - 290 . . DOI: 10.1109/12.210171 http://doi.org/10.1109/12.210171 . .
M Horowitz . . 1.1 computing's energy problem (and what we can do about it) . . IEEE Int Solid-State Circuits Conf Digest of Technical Papers , , 2014 . . p.10 - - 14 . . DOI: 10.1109/ISSCC.2014.6757323 http://doi.org/10.1109/ISSCC.2014.6757323 . .
L Hou , , , Q Yao , , , J Kwok . . Loss-aware binarization of deep networks . . http://arxiv.org/abs/1611.01600 http://arxiv.org/abs/1611.01600 , , 2016 . . .
A Howard , , , M Zhu , , , B Chen , , , 等 . . Mobilenets: efficient convolutional neural networks for mobile vision applications . . http://arxiv.org/abs/1704.04861 http://arxiv.org/abs/1704.04861 , , 2017 . . .
Q Hu , , , P Wang , , , J Cheng . . From hashing to CNNs: training binary weight networks via hashing . . 32 nd AAAI Conf on Artificial Intelligence, in press. , , 2018 . . .
K Hwang , , , W Sung . . Fixed-point feedforward deep neural network design using weights $$+1$$ , 0, and $$-1$$ . . IEEE Workshop on Signal Processing Systems , , 2014 . . p.1 - - 6 . . DOI: 10.1109/SiPS.2014.6986082 http://doi.org/10.1109/SiPS.2014.6986082 . .
F Iandola , , , S Han , , , M Moskewicz , , , 等 . . SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <; 0.5 MB model size . . http://arxiv.org/abs/1602.07360 http://arxiv.org/abs/1602.073602 , , 2016 . . .
M Jaderberg , , , A Vedaldi , , , A Zisserman . . Speeding up convolutional neural networks with low rank expansions . . http://arxiv.org/abs/1405.3866 http://arxiv.org/abs/1405.3866 , , 2014 . . .
H Jegou , , , M Douze , , , C Schmid . . Product quantization for nearest neighbor search . . IEEE Trans Patt Anal Mach Intell , , 2011 . . 33 ( ( 1 ): ): 117 - - 128 . . DOI: 10.1109/TPAMI.2010.57 http://doi.org/10.1109/TPAMI.2010.57 . .
N Jouppi . . In-datacenter performance analysis of a tensor processing unit . . Proc 44 th Annual Int Symp on Computer Architecture , , 2017 . . p.1 - - 12 . . DOI: 10.1145/3140659.3080246 http://doi.org/10.1145/3140659.3080246 . .
D Kim , , , J Kung , , , S Chai , , , 等 . . Neurocube: a programmable digital neuromorphic architecture with high-density 3D memory . . ACM/IEEE 43 rd Annual Int Symp on Computer Architecture , , 2016 . . p.380 - - 392 . . DOI: 10.1109/ISCA.2016.41 http://doi.org/10.1109/ISCA.2016.41 . .
K Kim , , , J Kim , , , J Yu , , , 等 . . Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks . . Proc 53 rd Annual Design Automation Conf, Article 124. , , 2016 . . DOI: 10.1145/2897937.2898011 http://doi.org/10.1145/2897937.2898011 . .
M Kim , , , P Smaragdis . . Bitwise neural networks . . http://arxiv.org/abs/1601.06071 http://arxiv.org/abs/1601.06071 , , 2016 . . .
YD Kim , , , E Park , , , S Yoo , , , 等 . . Compression of deep convolutional neural networks for fast and low power mobile applications . . http://arxiv.org/abs/1511.06530 http://arxiv.org/abs/1511.06530 , , 2015 . . .
J Ko , , , B Mudassar , , , T Na , , , 等 . . Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation . . ACM/EDAC/IEEE Design Automation Conf , , 2017 . . p.1 - - 6 . . DOI: 10.1145/3061639.3062228 http://doi.org/10.1145/3061639.3062228 . .
A Krizhevsky , , , G Hinton . . Learning Multiple Layers of Features from Tiny Images . . MS Thesis, Department of Computer Science, University of Toronto, Toronto, Canada , , 2009 . . .
A Krizhevsky , , , I Sutskever , , , G Hinton . . Imagenet classification with deep convolutional neural networks . . NIPS , , 2012 . . p.1097 - - 1105 . . . .
V Lebedev , , , V Lempitsky . . Fast ConvNets using group-wise brain damage . . IEEE Conf on Computer Vision and Pattern Recognition , , 2016 . . p.2554 - - 2564 . . . .
V Lebedev , , , Y Ganin , , , M Rakhuba , , , 等 . . Speeding-up convolutional neural networks using fine-tuned CP-decomposition . . http://arxiv.org/abs/1412.6553 http://arxiv.org/abs/1412.6553 , , 2014 . . .
Y LeCun , , , J Denker , , , S Solla , , , 等 . . Optimal brain damage . . NIPS , , 1989 . . p.598 - - 605 . . . .
EH Lee , , , D Miyashita , , , E Chai , , , 等 . . LogNet: energy-efficient neural networks using logarithmic computation . . IEEE Int Conf on Acoustics, Speech and Signal Processing , , 2017 . . p.5900 - - 5904 . . DOI: 10.1109/ICASSP.2017.7953288 http://doi.org/10.1109/ICASSP.2017.7953288 . .
F Li , , , B Zhang , , , B Liu . . Ternary weight networks . . http://arxiv.org/abs/1605.04711 http://arxiv.org/abs/1605.04711 , , 2016 . . .
G Li , , , F Li , , , T Zhao , , , 等 . . Block convolution: towards memory-efficient inference of large-scale CNNs on FPGA . . Design Automation and Test in Europe, in press , , 2018 . . .
M Lin , , , Q Chen , , , S Yan . . Network in network . . http://arxiv.org/abs/1312.4400 http://arxiv.org/abs/1312.4400 , , 2013 . . .
Z Lin , , , M Courbariaux , , , R Memisevic , , , 等 . . Neural networks with few multiplications . . http://arxiv.org/abs/1510.03009 http://arxiv.org/abs/1510.03009 , , 2015 . . .
S Liu , , , Z Du , , , J Tao , , , 等 . . Cambricon: an instruction set architecture for neural networks . . Proc 43 rd Int Symp on Computer Architecture , , 2016 . . p.393 - - 405 . . DOI: 10.1145/3007787.3001179 http://doi.org/10.1145/3007787.3001179 . .
Z Liu , , , J Li , , , Z Shen , , , 等 . . Learning efficient convolutional networks through network slimming . . IEEE Int Conf on Computer Vision , , 2017 . . p.2736 - - 2744 . . . .
J Luo , , , J Wu , , , W Lin . . ThiNet: a filter level pruning method for deep neural network compression . . http://arxiv.org/abs/1707.06342 http://arxiv.org/abs/1707.06342 , , 2017 . . .
Y Ma , , , Y Cao , , , S Vrudhula , , , 等 . . An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks . . 27 th Int Conf on Field Programmable Logic and Applications , , 2017a . . p.1 - - 8 . . DOI: 10.23919/FPL.2017.8056824 http://doi.org/10.23919/FPL.2017.8056824 . .
Y Ma , , , Y Cao , , , S Vrudhula , , , 等 . . Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2017b . . p.45 - - 54 . . DOI: 10.1145/3020078.3021736 http://doi.org/10.1145/3020078.3021736 . .
Y Ma , , , M Kim , , , Y Cao , , , 等 . . End-to-end scalable FPGA accelerator for deep residual networks . . IEEE Int Symp on Circuits and Systems , , 2017c . . p.1 - - 4 . . DOI: 10.1109/ISCAS.2017.8050344 http://doi.org/10.1109/ISCAS.2017.8050344 . .
H Mao , , , S Han , , , J Pool , , , 等 . . Exploring the regularity of sparse structure in convolutional neural networks . . http://arxiv.org/abs/1705.08922 http://arxiv.org/abs/1705.08922 , , 2017 . . .
D Miyashita , , , E Lee , , , B Murmann . . Convolutional neural networks using logarithmic data representation . . http://arxiv.org/abs/1603.01025 http://arxiv.org/abs/1603.01025 , , 2016 . . .
D Nguyen , , , D Kim , , , J Lee . . Double MAC: doubling the performance of convolutional neural networks on modern FPGAs . . Design, Automation & Test in Europe Conf & Exhibition , , 2017 . . p.890 - - 893 . . DOI: 10.23919/DATE.2017.7927113 http://doi.org/10.23919/DATE.2017.7927113 . .
E Nurvitadhi , , , Hillsboro , , , G Venkatesh , , , 等 . . Can FPGAs beat GPUs in accelerating next-generation deep neural networks . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2017 . . p.5 - - 14 . . DOI: 10.1145/3020078.3021740 http://doi.org/10.1145/3020078.3021740 . .
A Parashar , , , M Rhu , , , A Mukkara , , , 等 . . SCNN: an accelerator for compressed-sparse convolutional neural networks . . Proc 44 th Annual Int Symp on Computer Architecture , , 2017 . . p.27 - - 40 . . DOI: 10.1145/3140659.3080254 http://doi.org/10.1145/3140659.3080254 . .
M Price , , , J Glass , , , A Chandrakasan . . 14.4 a scalable speech recognizer with deep-neural-network acoustic models and voice-activated power gating . . IEEE Int Solid-State Circuits Conf , , 2017 . . p.244 - - 245 . . DOI: 10.1109/ISSCC.2017.7870352 http://doi.org/10.1109/ISSCC.2017.7870352 . .
JT Qiu , , , J Wang , , , S Yao , , , 等 . . Going deeper with embedded FPGA platform for convolutional neural network . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2016 . . p.26 - - 35 . . DOI: 10.1145/2847263.2847265 http://doi.org/10.1145/2847263.2847265 . .
M Rastegari , , , V Ordonez , , , J Redmon , , , 等 . . XNOR-Net: ImageNet classification using binary convolutional neural networks . . European Conf on Computer Vision , , 2016 . . p.525 - - 542 . . DOI: 10.1007/978-3-319-46493-0\_32 http://doi.org/10.1007/978-3-319-46493-0\_32 . .
A Ren , , , Z Li , , , C Ding , , , 等 . . SC-DCNN: highly-scalable deep convolutional neural network using stochastic computing . . Proc 22 nd Int Conf on Architectural Support for Programming Languages and Operating Systems , , 2017 . . p.405 - - 418 . . DOI: 10.1145/3093336.3037746 http://doi.org/10.1145/3093336.3037746 . .
A Romero , , , N Ballas , , , S Kahou , , , 等 . . FitNets: hints for thin deep nets . . http://arxiv.org/abs/1412.6550 http://arxiv.org/abs/1412.6550 , , 2014 . . .
O Russakovsky , , , J Deng , , , H Su , , , 等 . . Imagenet large scale visual recognition challenge . . Int J Comput Vis , , 2015 . . 115 ( ( 3 ): ): 211 - - 252 . . DOI: 10.1007/s11263-015-0816-y4 http://doi.org/10.1007/s11263-015-0816-y4 . .
H Sharma , , , J Park , , , D Mahajan , , , 等 . . From high-level deep neural models to FPGAs . . 49 th Annual IEEE/ACM Int Symp on Microarchitecture , , 2016 . . p.1 - - 21 . . DOI: 10.1109/MICRO.2016.7783720 http://doi.org/10.1109/MICRO.2016.7783720 . .
Y Shen , , , M Ferdman , , , P Milder . . Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer . . IEEE 25 th Annual Int Symp on Field-Programmable Custom Computing Machines , , 2017 . . p.93 - - 100 . . DOI: 10.1109/FCCM.2017.47 http://doi.org/10.1109/FCCM.2017.47 . .
H Sim , , , J Lee . . A new stochastic computing multiplier with application to deep convolutional neural networks . . Proc 54 th Annual Design Automation Conf, Article 29 , , 2017 . . DOI: 10.1145/3061639.3062290 http://doi.org/10.1145/3061639.3062290 . .
K Simonyan , , , A Zisserman . . Very deep convolutional networks for large-scale image recognition . . http://arxiv.org/abs/1409.1556 http://arxiv.org/abs/1409.1556 , , 2014 . . .
N Suda , , , V Chandra , , , G Dasika , , , 等 . . Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2016 . . p.16 - - 25 . . DOI: 10.1145/2847263.2847276 http://doi.org/10.1145/2847263.2847276 . .
C Szegedy , , , W Liu , , , Y Jia , , , 等 . . Going deeper with convolutions . . Conf on Computer Vision and Pattern Recognition , , 2015 . . p.1 - - 9 . . DOI: 10.1109/CVPR.2015.7298594 http://doi.org/10.1109/CVPR.2015.7298594 . .
W Tang , , , G Hua , , , L Wang . . How to train a compact binary neural network with high accuracy . . 31 st AAAI Conf on Artificial Intelligence , , 2017 . . p.2625 - - 2631 . . . .
H Tann , , , S Hashemi , , , I Bahar , , , 等 . . Hardware-software codesign of accurate, multiplier-free deep neural networks . . 54 th ACM/EDAC/IEEE Design Automation Conf , , 2017 . . p.1 - - 6 . . DOI: 10.1145/3061639.3062259 http://doi.org/10.1145/3061639.3062259 . .
Y Umuroglu , , , N Fraster , , , G Gambardella , , , 等 . . FINN: a framework for fast, scalable binarized neural network inference . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2017 . . p.65 - - 74 . . DOI: 10.1145/3020078.3021744 http://doi.org/10.1145/3020078.3021744 . .
S Venieris , , , C Bouganis . . fpgaAConvNet: a framework for mapping convolutional neural networks on FPGAs . . IEEE 24 th Annual Int Symp on Field-Programmable Custom Computing Machines , , 2016 . . p.40 - - 47 . . DOI: 10.1109/FCCM.2016.22 http://doi.org/10.1109/FCCM.2016.22 . .
S Venkataramani , , , A Ranjan , , , S Banerjee , , , 等 . . ScaleDeep: a scalable compute architecture for learning and evaluating deep networks . . Proc 44 th Annual Int Symp on Computer Architecture , , 2017 . . p.13 - - 26 . . DOI: 10.1145/3079856.3080244 http://doi.org/10.1145/3079856.3080244 . .
P Wang , , , J Cheng . . Accelerating convolutional neural networks for mobile applications . . Proc ACM on Multimedia Conf , , 2016 . . p.541 - - 545 . . DOI: 10.1145/2964284.2967280 http://doi.org/10.1145/2964284.2967280 . .
P Wang , , , J Cheng . . Fixed-point factorized networks . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p.4012 - - 4020 . . . .
P Wang , , , Q Hu , , , Z Fang , , , 等 . . Deepsearch: a fast image search framework for mobile devices . . ACM Trans Multim Comput Commun Appl , , 2018 . . 14 ( ( 1 ): ): Article 6 DOI: 10.1145/3152127 http://doi.org/10.1145/3152127 . .
Y Wang , , , J Xu , , , Y Han , , , 等 . . Deepburning: automatic generation of FPGA-based learning accelerators for the neural network family . . Proc 53 rd Annual Design Automation Conf, Article 110 , , 2016 . . DOI: 10.1145/2897937.2898003 http://doi.org/10.1145/2897937.2898003 . .
SC Wei , , , CH Yu , , , P Zhang , , , 等 . . Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs . . 54 th ACM/EDAC/IEEE Design Automation Conf , , 2017 . . p.1 - - 6 . . DOI: 10.1145/3061639.3062207 http://doi.org/10.1145/3061639.3062207 . .
W Wen , , , C Wu , , , Y Wang , , , 等 . . Learning structured sparsity in deep neural networks . . NIPS , , 2016 . . p.2074 - - 2082 . . . .
J Wu , , , C Leng , , , Y Wang , , , 等 . . Quantized convolutional neural networks for mobile devices . . Proc IEEE Conf on Computer Vision and Pattern Recognition , , 2016 . . p.4820 - - 4828 . . . .
L Xia , , , T Tang , , , W Huangfu , , , 等 . . Switched by input: power efficient structure for RRAM-based convolutional neural network . . 53 rd ACM/EDAC/IEEE Design Automation Conf , , 2016 . . p.1 - - 6 . . DOI: 10.1145/2897937.2898101 http://doi.org/10.1145/2897937.2898101 . .
QC Xiao , , , Y Liang , , , LQ Lu , , , 等 . . Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs . . 54 th ACM/EDAC/IEEE Design Automation Conf , , 2017 . . p.1 - - 6 . . DOI: 10.1145/3061639.3062244 http://doi.org/10.1145/3061639.3062244 . .
S Xie , , , R Girshick , , , P Dollar , , , 等 . . Aggregated residual transformations for deep neural networks . . IEEE Conf on Computer Vision and Pattern Recognition , , 2017 . . p.5987 - - 5995 . . DOI: 10.1109/CVPR.2017.634 http://doi.org/10.1109/CVPR.2017.634 . .
H Yang . . TIME: a training-in-memory architecture for memristor-based deep neural networks . . 54 th ACM/EDAC/IEEE Design Automation Conf , , 2017 . . p.1 - - 6 . . DOI: 10.1145/3061639.3062326 http://doi.org/10.1145/3061639.3062326 . .
S Zagoruyko , , , N Komodakis . . Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer . . http://arxiv.org/abs/1612.03928 http://arxiv.org/abs/1612.03928 , , 2016 . . .
C Zhang , , , P Li , , , GY Sun , , , 等 . . Optimizing FPGA-based accelerator design for deep convolutional neural networks . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2015 . . p.161 - - 170 . . DOI: 10.1145/2684746.2689060 http://doi.org/10.1145/2684746.2689060 . .
C Zhang , , , Z Fang , , , P Pan , , , 等 . . Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks . . IEEE/ACM Int Conf on Computer-Aided Design , , 2016a . . p.1 - - 8 . . DOI: 10.1145/2966986.2967011 http://doi.org/10.1145/2966986.2967011 . .
C Zhang , , , D Wu , , , J Sun , , , 等 . . Energy-efficient CNN implementation on a deeply pipelined FPGA cluster . . Proc Int Symp on Low Power Electronics and Design , , 2016b . . p.326 - - 331 . . DOI: 10.1145/2934583.2934644 http://doi.org/10.1145/2934583.2934644 . .
S Zhang , , , Z Du , , , L Zhang , , , 等 . . Cambricon-X: an accelerator for sparse neural networks . . 49 th Annual IEEE/ACM Int Symp on Microarchitecture , , 2016 . . p.1 - - 12 . . DOI: 10.1109/MICRO.2016.7783723 http://doi.org/10.1109/MICRO.2016.7783723 . .
X Zhang , , , J Zou , , , K He , , , 等 . . Accelerating very deep convolutional networks for classification and detection . . IEEE Trans Patt Anal Mach Intell , , 2015 . . 38 ( ( 10 ): ): 1943 - - 1955 . . DOI: 10.1109/TPAMI.2015.2502579 http://doi.org/10.1109/TPAMI.2015.2502579 . .
X Zhang , , , X Zhou , , , M Lin , , , 等 . . ShuffleNet: an extremely efficient convolutional neural network for mobile devices . . http://arxiv.org/abs/1707.01083 http://arxiv.org/abs/1707.01083 , , 2017 . . .
R Zhao , , , WN Song , , , WT Zhang , , , 等 . . Accelerating binarized convolutional neural networks with software-programmable FPGAs . . Proc ACM/SIGDA Int Symp on Field-Programmable Gate Arrays , , 2017 . . p.15 - - 24 . . DOI: 10.1145/3020078.3021741 http://doi.org/10.1145/3020078.3021741 . .
A Zhou , , , A Yao , , , Y Guo , , , 等 . . Incremental network quantization: towards lossless CNNs with low-precision weights . . http://arxiv.org/abs/1702.03044 http://arxiv.org/abs/1702.03044 , , 2017 . . .
S Zhou , , , Y Wu , , , Z Ni , , , 等 . . DoReFa-Net: training low bitwidth convolutional neural networks with low bitwidth gradients . . http://arxiv.org/abs/1606.06160 http://arxiv.org/abs/1606.06160 , , 2016 . . .
C Zhu , , , S Han , , , H Mao , , , 等 . . Trained ternary quantization . . http://arxiv.org/abs/1612.01064 http://arxiv.org/abs/1612.01064 , , 2016 . . .
J Zhu , , , Z Qian , , , C Tsui . . LRADNN: high-throughput and energy-efficient deep neural network accelerator using low rank approximation . . 21 st Asia and South Pacific Design Automation Conf , , 2016 . . p.581 - - 586 . . DOI: 10.1109/ASPDAC.2016.7428074 http://doi.org/10.1109/ASPDAC.2016.7428074 . .
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621