一种基于高斯过程与粒子群算法的CNN超参数自动搜索混合模型优化算法

闫涵; 仲崇权; 吴玉虎; 张立勇; 卢伟

doi:10.1631/FITEE.2200515

Your Location：

Home >

Browse articles >

一种基于高斯过程与粒子群算法的CNN超参数自动搜索混合模型优化算法

常规文章 | Updated：2023-04-29

- 一种基于高斯过程与粒子群算法的CNN超参数自动搜索混合模型优化算法
- Ahybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search
- 信息与电子工程前沿（英文） 2023年24卷第11期页码：1557-1573
- Affiliations：
  
  School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China
- Author bio：
  
  ‡Corresponding author
- Funds：
  
  National Natural Science Foundation of China(62073056;61876029);Applied Basic Research Project of Liaoning Province, China(2023JH2/101300207);Key Field Innovation Team Project of Dalian, China(2021RT14)
- DOI：10.1631/FITEE.2200515
  中图分类号： TP181
- 纸质出版日期：2023-11-0 ，
  
  网络出版日期：2023-09-07，
  
  收稿日期：2022-10-27，
  
  录用日期：2023-04-20
- Accepted：
Scan QR Code
闫涵, 仲崇权, 吴玉虎, 等. 一种基于高斯过程与粒子群算法的CNN超参数自动搜索混合模型优化算法[J]. 信息与电子工程前沿（英文）, 2023,24(11):1557-1573.

HAN YAN, CHONGQUAN ZHONG, YUHU WU, et al. Ahybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search. [J]. Frontiers of information technology & electronic engineering, 2023, 24(11): 1557-1573.
闫涵, 仲崇权, 吴玉虎, 等. 一种基于高斯过程与粒子群算法的CNN超参数自动搜索混合模型优化算法[J]. 信息与电子工程前沿（英文）, 2023,24(11):1557-1573. DOI： 10.1631/FITEE.2200515.

HAN YAN, CHONGQUAN ZHONG, YUHU WU, et al. Ahybrid-model optimization algorithm based on the Gaussian process and particle swarm optimization for mixed-variable CNN hyperparameter automatic search. [J]. Frontiers of information technology & electronic engineering, 2023, 24(11): 1557-1573. DOI： 10.1631/FITEE.2200515.

摘要

卷积神经网络（CNN）在许多实际应用领域中有着快速发展。然而，CNN性能很大程度上取决于其超参数，而为CNN配置合适的超参数通常面临着以下3个挑战：（1）不同类型CNN超参数的混合变量编码问题；（2）评估候选模型的昂贵计算成本问题；（3）确保搜索过程中收敛速率和模型性能问题。针对上述问题，提出一种基于高斯过程（GP）和粒子群优化算法（PSO）的混合模型优化算法（GPPSO），用于自动搜索最优的CNN超参数配置。首先，设计一种新的编码方法高效编码CNN中不同类型的超参数。其次，提出一种混合代理辅助（HSA）模型降低评估候选模型的高计算成本。最后，设计一种新的激活函数改善模型性能并确保收敛速率。在图像分类基准数据集上进行了大量实验，验证GPPSO优于最先进的方法。以金属断口诊断为例，验证GPPSO算法在实际应用中的有效性。实验结果表明，GPPSO仅需0.04和1.70 GPU天即可在CIFAR-10和CIFAR-100数据集上实现95.26％和76.36％识别准确率。

Abstract

Convolutional neural networks (CNNs) have been developed quickly in many real-world fields. However

CNN's performance depends heavily on its hyperparameters

while finding suitable hyperparameters for CNNs working in application fields is challenging for three reasons: (1) the problem of mixed-variable encoding for different types of hyperparameters in CNNs

(2) expensive computational costs in evaluating candidate hyperparameter configuration

and (3) the problem of ensuring convergence rates and model performance during hyperparameter search. To overcome these problems and challenges

a hybrid-model optimization algorithm is proposed in this paper to search suitable hyperparameter configurations automatically based on the Gaussian process and particle swarm optimization (GPPSO) algorithm. First

a new encoding method is designed to efficiently deal with the CNN hyperparameter mixed-variable problem. Second

a hybrid-surrogate-assisted model is proposed to reduce the high cost of evaluating candidate hyperparameter configurations. Third

a novel activation function is suggested to improve the model performance and ensure the convergence rate. Intensive experiments are performed on imageclassification benchmark datasets to demonstrate the superior performance of GPPSO over state-of-the-art methods. Moreover

a case study on metal fracture diagnosis is carried out to evaluate the GPPSO algorithm performance in practical applications. Experimental results demonstrate the effectiveness and efficiency of GPPSO

achieving accuracy of 95.26% and 76.36% only through 0.04 and 1.70 GPU days on the CIFAR-10 and CIFAR-100 datasets

respectively.

关键词

卷积神经网络高斯过程混合模型超参数优化混合变量粒子群优化

Keywords

Convolutional neural networkGaussian processHybrid modelHyperparameter optimizationMixed-variableParticle swarm optimization

references

Abadi M, Agarwal A, Barham P, et al., 2016. TensorFlow: large-scale machine learning on heterogeneous distributed systems. https://arxiv.org/abs/1603.04467https://arxiv.org/abs/1603.04467

Alvarez-Rodriguez U, Battiston F, de Arruda GF, et al., 2021. Evolutionary dynamics of higher-order interactions in social networks. Nat Hum Behav, 5(5):586-595. https://doi.org/10.1038/s41562-020-01024-1https://doi.org/10.1038/s41562-020-01024-1

Alzubaidi L, Zhang JL, Humaidi AJ, et al., 2021. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data, 8(1):53. https://doi.org/10.1186/s40537-021-00444-8https://doi.org/10.1186/s40537-021-00444-8

Baker B, Gupta O, Naik N, et al., 2017. Designing neural network architectures using reinforcement learning. https://arxiv.org/abs/1611.02167https://arxiv.org/abs/1611.02167

Cai H, Chen TY, Zhang WN, et al., 2018. Efficient architecture search by network transformation. Proc 32nd AAAI Conf on Artificial Intelligence, p.2787-2794. https://doi.org/10.1609/aaai.v32i1.11709https://doi.org/10.1609/aaai.v32i1.11709

Chen ZG, Zhan ZH, Kwong S, et al., 2022. Evolutionary computation for intelligent transportation in smart cities: a survey. IEEE Comput Intell Mag, 17(2):83-102. https://doi.org/10.1109/MCI.2022.3155330https://doi.org/10.1109/MCI.2022.3155330

Darwish A, Hassanien AE, Das S, 2020. A survey of swarm and evolutionary computing approaches for deep learning. Artif Intell Rev, 53(3):1767-1812. https://doi.org/10.1007/s10462-019-09719-2https://doi.org/10.1007/s10462-019-09719-2

Fernandes FE, Yen GG, 2021. Automatic searching and pruning of deep neural networks for medical imaging diagnostic. IEEE Trans Neur Netw Learn Syst, 32(12):5664-5674. https://doi.org/10.1109/TNNLS.2020.3027308https://doi.org/10.1109/TNNLS.2020.3027308

Fielding B, Lawrence T, Zhang L, 2019. Evolving and ensembling deep CNN architectures for image classification. Int Joint Conf on Neural Networks, p.1-8. https://doi.org/10.1109/IJCNN.2019.8852369https://doi.org/10.1109/IJCNN.2019.8852369

Goodfellow IJ, Warde-Farley D, Mirza M, et al., 2013. Maxout networks. Proc 30th Int Conf on Machine Learning, p.1319-1327.

Grigorescu S, Trasnea B, Cocias T, et al., 2020. A survey of deep learning techniques for autonomous driving. J Field Robot, 37(3):362-386. https://doi.org/10.1002/rob.21918https://doi.org/10.1002/rob.21918

Guo H, Zhang W, Nie XY, et al., 2022. High-speed planar imaging of OH radicals in turbulent flames assisted by deep learning. Appl Phys B, 128(3):52. https://doi.org/10.1007/s00340-021-07742-2https://doi.org/10.1007/s00340-021-07742-2

He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. IEEE Conf on Computer Vision and Pattern Recognition, p.770-778. https://doi.org/10.1109/CVPR.2016.90https://doi.org/10.1109/CVPR.2016.90

Huang G, Liu Z, van der Maaten L, et al., 2017. Densely connected convolutional networks. 30th IEEE Conf on Computer Vision and Pattern Recognition, p.2261-2269. https://doi.org/10.1109/CVPR.2017.243https://doi.org/10.1109/CVPR.2017.243

Jiang WW, Luo JY, 2022. Graph neural network for traffic forecasting: a survey. Expert Syst Appl, 207:117921. https://doi.org/10.1016/j.eswa.2022.117921https://doi.org/10.1016/j.eswa.2022.117921

Jin HF, Song QQ, Hu X, 2019. Auto-Keras: an efficient neural architecture search system. Proc 25th ACM SIGKDD Int Conf on Knowledge Discovery & Data Mining, p.1946-1956. https://doi.org/10.1145/3292500.3330648https://doi.org/10.1145/3292500.3330648

Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84-90. https://doi.org/10.1145/3065386https://doi.org/10.1145/3065386

Larsson G, Maire M, Shakhnarovich G, 2016. FractalNet: ultra-deep neural networks without residuals. https://arxiv.org/abs/1605.07648https://arxiv.org/abs/1605.07648

Li JY, Zhan ZH, Wang C, et al., 2020. Boosting data-driven evolutionary algorithm with localized data generation. IEEE Trans Evol Comput, 24(5):923-937. https://doi.org/10.1109/TEVC.2020.2979740https://doi.org/10.1109/TEVC.2020.2979740

Li JY, Zhan ZH, Liu RD, et al., 2021. Generation-level parallelism for evolutionary computation: a pipeline-based parallel particle swarm optimization. IEEE Trans Cybern, 51(10):4848-4859. https://doi.org/10.1109/TCYB.2020.3028070https://doi.org/10.1109/TCYB.2020.3028070

Li JY, Zhan ZH, Zhang J, 2022. Evolutionary computation for expensive optimization: a survey. Mach Intell Res, 19(1):3-23. https://doi.org/10.1007/s11633-022-1317-4https://doi.org/10.1007/s11633-022-1317-4

Li JY, Zhan ZH, Xu J, et al., 2023. Surrogate-assisted hybrid-model estimation of distribution algorithm for mixed-variable hyperparameters optimization in convolutional neural networks. IEEE Trans Neur Netw Learn Syst, 34(5):2338-2352. https://doi.org/10.1109/TNNLS.2021.3106399https://doi.org/10.1109/TNNLS.2021.3106399

Li X, Lai SQ, Qian XM, 2022. DBCFace: towards pure convolutional neural network face detection. IEEE Trans Circ Syst Video Technol, 32(4):1792-1804. https://doi.org/10.1109/TCSVT.2021.3082635https://doi.org/10.1109/TCSVT.2021.3082635

Lin M, Chen Q, Yan SC, 2013. Network in network. https://arxiv.org/abs/1312.4400https://arxiv.org/abs/1312.4400

Liu HX, Simonyan K, Vinyals O, et al., 2017. Hierarchical representations for efficient architecture search. https://arxiv.org/abs/1711.00436https://arxiv.org/abs/1711.00436

Miranda LJV, 2018. PySwarms: a research toolkit for particle swarm optimization in Python. J Open Source Softw, 3(21):433. https://doi.org/10.21105/joss.00433https://doi.org/10.21105/joss.00433

Poli R, Kennedy J, Blackwell T, 2007. Particle swarm optimization. Swarm Intell, 1(1):33-57. https://doi.org/10.1007/s11721-007-0002-0https://doi.org/10.1007/s11721-007-0002-0

Real E, Moore S, Selle A, et al., 2017. Large-scale evolution of image classifiers. https://arxiv.org/abs/1703.01041v2https://arxiv.org/abs/1703.01041v2

Simonyan K, Zisserman A, 2014. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556https://arxiv.org/abs/1409.1556

Snoek J, Larochelle H, Adams RP, 2012. Practical Bayesian optimization of machine learning algorithms. https://arxiv.org/abs/1206.2944https://arxiv.org/abs/1206.2944

Springenberg JT, Dosovitskiy A, Brox T, et al., 2014. Striving for simplicity: the all convolutional net. https://arxiv.org/abs/1412.6806v3https://arxiv.org/abs/1412.6806v3

Srivastava RK, Greff K, Schmidhuber J, 2015. Highway networks. https://arxiv.org/abs/1505.00387https://arxiv.org/abs/1505.00387

Suganuma M, Shirakawa S, Nagao T, 2017. A genetic programming approach to designing convolutional neural network architectures. Proc Genetic and Evolutionary Computation Conf, p.497-504. https://doi.org/10.1145/3071178.3071229https://doi.org/10.1145/3071178.3071229

Sun YN, Xue B, Zhang MJ, et al., 2019. A particle swarm optimization-based flexible convolutional autoencoder for image classification. IEEE Trans Neur Netw Learn Syst, 30(8):2295-2309. https://doi.org/10.1109/TNNLS.2018.2881143https://doi.org/10.1109/TNNLS.2018.2881143

Sun YN, Xue B, Zhang MJ, et al., 2020a. Automatically designing CNN architectures using the genetic algorithm for image classification. IEEE Trans Cybern, 50(9):3840-3854. https://doi.org/10.1109/TCYB.2020.2983860https://doi.org/10.1109/TCYB.2020.2983860

Sun YN, Xue B, Zhang M, et al., 2020b. Completely automated CNN architecture design based on blocks. IEEE Trans Neur Netw Learn Syst, 31(4):1242-1254. https://doi.org/10.1109/TNNLS.2019.2919608https://doi.org/10.1109/TNNLS.2019.2919608

Sun YN, Wang HD, Xue B, et al., 2020c. Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Trans Evol Comput, 24(2):350-364. https://doi.org/10.1109/TEVC.2019.2924461https://doi.org/10.1109/TEVC.2019.2924461

Tulbure AA, Tulbure AA, Dulf EH, 2022. A review on modern defect detection models using DCNNs-deep convolutional neural networks. J Adv Res, 35:33-48. https://doi.org/10.1016/j.jare.2021.03.015https://doi.org/10.1016/j.jare.2021.03.015

Wang B, Sun YN, Xue B, et al., 2018. Evolving deep convolutional neural networks by variable-length particle swarm optimization for image classification. IEEE Congress on Evolutionary Computation, p.1-8. https://doi.org/10.1109/CEC.2018.8477735https://doi.org/10.1109/CEC.2018.8477735

Wang B, Xue B, Zhang MJ, 2020. Particle swarm optimisation for evolving deep neural networks for image classification by evolving and stacking transferable blocks. IEEE Congress on Evolutionary Computation, p.1-8. https://doi.org/10.1109/CEC48606.2020.9185541https://doi.org/10.1109/CEC48606.2020.9185541

Wang YQ, Li JY, Chen CH, et al., 2022. Scale adaptive fitness evaluation-based particle swarm optimisation for hyperparameter and architecture optimisation in neural networks and deep learning. CAAI Trans Intell Technol, early access. https://doi.org/10.1049/cit2.12106https://doi.org/10.1049/cit2.12106

Wu SH, Zhan ZH, Tan KC, et al., 2023. Orthogonal transfer for multitask optimization. IEEE Trans Evol Comput, 27(1):185-200. https://doi.org/10.1109/TEVC.2022.3160196https://doi.org/10.1109/TEVC.2022.3160196

Wu T, Shi J, Zhou DY, et al., 2019. A multi-objective particle swarm optimization for neural networks pruning. IEEE Congress on Evolutionary Computation, p.570-577. https://doi.org/10.1109/CEC.2019.8790145https://doi.org/10.1109/CEC.2019.8790145

Xie LX, Yuille A, 2017. Genetic CNN. IEEE Int Conf on Computer Vision, p.1388-1397. https://doi.org/10.1109/ICCV.2017.154https://doi.org/10.1109/ICCV.2017.154

Zhan ZH, Li JY, Zhang J, 2022a. Evolutionary deep learning: a survey. Neurocomputing, 483:42-58. https://doi.org/10.1016/j.neucom.2022.01.099https://doi.org/10.1016/j.neucom.2022.01.099

Zhan ZH, Zhang J, Lin Y, et al., 2022b. Matrix-based evolutionary computation. IEEE Trans Emerg Top Comput Intell, 6(2):315-328. https://doi.org/10.1109/TETCI.2020.3047410https://doi.org/10.1109/TETCI.2020.3047410

Zhong Z, Yan JJ, Wu W, et al., 2018. Practical block-wise neural network architecture generation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.2423-2432. https://doi.org/10.1109/CVPR.2018.00257https://doi.org/10.1109/CVPR.2018.00257

Zoph B, Le QV, 2017. Neural architecture search with reinforcement learning. https://arxiv.org/abs/1611.01578https://arxiv.org/abs/1611.01578

浏览量

193

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

Hybrid-driven Gaussian process online learning for highly maneuvering multi-target tracking

Deep 3D reconstruction: methods, data, and challenges

An artificial intelligence enhanced star identification algorithm

Aggregated context network for crowd counting

Novel 3D point set registration method based on regionalized Gaussian process map reconstruction