Representation learning via a semi-supervised stacked distance autoencoder for image classification

Liang HOU; Xiao-yi LUO; Zi-yang WANG; Jun LIANG

doi:10.1631/FITEE.1900116

Your Location：

Home >

Browse articles >

Representation learning via a semi-supervised stacked distance autoencoder for image classification

Regular Papers | Updated：2022-05-19

- Representation learning via a semi-supervised stacked distance autoencoder for image classification
- 半监督堆叠距离自动编码器的表征学习在图像分类上的应用
- Frontiers of Information Technology & Electronic Engineering Vol. 21, Issue 7, Pages: 1005-1018(2020)
- Affiliations：
  
  College of Control Science and Engineering, Zhejiang University, Hangzhou 310027, China
- Author bio：
  
  Jun LIANG, E-mail: jliang@zju.edu.cn
- Funds：
- DOI：10.1631/FITEE.1900116
  CLC： TP391.9
- Received：28 February 2019，
  
  Revised：2020-;6-10，
  
  Published：2020-07
- Accepted：
Scan QR Code
Liang HOU, Xiao-yi LUO, Zi-yang WANG, et al. Representation learning via a semi-supervised stacked distance autoencoder for image classification[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(7): 1005-1018.
DOI：

Liang HOU, Xiao-yi LUO, Zi-yang WANG, et al. Representation learning via a semi-supervised stacked distance autoencoder for image classification[J]. Frontiers of Information Technology & Electronic Engineering, 2020, 21(7): 1005-1018. DOI： 10.1631/FITEE.1900116.

摘要

图像分类是深度学习的重要应用。在典型分类任务中，分类精度与通过深度学习方法提取的特征密切相关。自动编码器是一种特殊神经网络，常用于降维和特征提取。本文所提方法基于传统的自动编码器，将不同类别样本之间的"距离"信息纳入其中。该模型被称为半监督距离自动编码器。首先以无监督方式对每一层进行预训练。在随后的监督训练中，将优化的参数设置为初始值。为获得更好性能，使用堆叠式模型代替具有单一隐含层的传统自动编码器结构。开展一系列实验测试不同模型在几个数据集上的性能，包括MNIST数据集、街景门牌号码（SVHN）数据集、德国交通标志识别基准（GTSRB）和CIFAR-10数据集。将所提半监督距离自动编码器方法分别与传统自动编码器、稀疏自动编码器和监督自动编码器比较，实验结果证明该模型有效。

Abstract

Image classification is an important application of deep learning. In a typical classification task

the classification accuracy is strongly related to the features that are extracted via deep learning methods. An autoencoder is a special type of neural network

often used for dimensionality reduction and feature extraction. The proposed method is based on the traditional autoencoder

incorporating the "distance" information between samples from different categories. The model is called a semi- supervised distance autoencoder. Each layer is first pre-trained in an unsupervised manner. In the subsequent supervised training

the optimized parameters are set as the initial values. To obtain more suitable features

we use a stacked model to replace the basic autoencoder structure with a single hidden layer. A series of experiments are carried out to test the performance of different models on several datasets

including the MNIST dataset

street view house numbers (SVHN) dataset

German traffic sign recognition benchmark (GTSRB)

and CIFAR-10 dataset. The proposed semi-supervised distance autoencoder method is compared with the traditional autoencoder

sparse autoencoder

and supervised autoencoder. Experimental results verify the effectiveness of the proposed model.

关键词

Keywords

references

Y Bengio . . Learning deep architectures for AI . . Found Trends Mach Learn , , 2009 . . 2 ( ( 1 ): ): 1 - - 127 . . DOI: 10.1561/2200000006 http://doi.org/10.1561/2200000006 . .

Y Bengio , , , A Courville , , , P Vincent . . Representation learning: a review and new perspectives . . IEEE Trans Patt Anal Mach Intell , , 2013 . . 35 ( ( 8 ): ): 1798 - - 1828 . . DOI: 10.1109/tpami.2013.50 http://doi.org/10.1109/tpami.2013.50 . .

S Bianco , , , M Buzzelli , , , R Schettini . . Multiscale fully convolutional network for image saliency . . J Electron Imag , , 2018 . . 27 ( ( 5 ): ): 051221 DOI: 10.1117/1.jei.27.5.051221 http://doi.org/10.1117/1.jei.27.5.051221 . .

J Deng , , , ZX Zhang , , , E Marchi , , , 等 . . Sparse autoencoder- based feature transfer learning for speech emotion recognition . . Humaine Association Conf on Affective Computing and Intelligent Interaction , , 2013 . . p.511 - - 516 . . DOI: 10.1109/acii.2013.90 http://doi.org/10.1109/acii.2013.90 . .

F Du , , , JS Zhang , , , NN Ji , , , 等 . . Discriminative representation learning with supervised auto-encoder . . Neur Process Lett , , 2018 . . 49 ( ( 2 ): ): 507 - - 520 . . DOI: 10.1007/s11063-018-9828-2 http://doi.org/10.1007/s11063-018-9828-2 . .

SW Feng , , , MF Duarte . . Graph autoencoder-based unsupervised feature selection with broad and local data structure preservation . . Neurocomputing , , 2018 . . 312 310 - - 323 . . DOI: 10.1016/j.neucom.2018.05.117 http://doi.org/10.1016/j.neucom.2018.05.117 . .

X Glorot , , , Y Bengio . . Understanding the difficulty of training deep feedforward neural networks . . J Mach Learn Res , , 2010 . . 9 249 - - 256 . . . .

YC Gong , , , S Lazebnik , , , A Gordo , , , 等 . . Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval . . IEEE Trans Patt Anal Mach Intell , , 2013 . . 35 ( ( 12 ): ): 2916 - - 2929 . . DOI: 10.1109/tpami.2012.193 http://doi.org/10.1109/tpami.2012.193 . .

RM Haralick , , , K Shanmugam , , , I Dinstein . . Textural features for image classification . . IEEE Trans Syst Man Cybern , , 1973 . . SMC-3 ( ( 6 ): ): 610 - - 621 . . DOI: 10.1109/TSMC.1973.4309314 http://doi.org/10.1109/TSMC.1973.4309314 . .

XT He , , , YX Peng , , , JJ Zhao . . Fast fine-grained image classification via weakly supervised discriminative localization . . IEEE Trans Circ Syst Video Technol , , 2018 . . 29 ( ( 5 ): ): 1394 - - 1407 . . DOI: 10.1109/tcsvt.2018.2834480 http://doi.org/10.1109/tcsvt.2018.2834480 . .

XT He , , , YX Peng , , , JJ Zhao . . Which and how many regions to gaze: focus discriminative regions for fine- grained visual categorization . . Int J Comput Vis , , 2019 . . 127 ( ( 9 ): ): 1235 - - 1255 . . DOI: 10.1007/s11263-019-01176-2 http://doi.org/10.1007/s11263-019-01176-2 . .

GE Hinton . . Learning multiple layers of representation . . Trends Cogn Sci , , 2007 . . 11 ( ( 10 ): ): 428 - - 434 . . DOI: 10.1016/j.tics.2007.09.004 http://doi.org/10.1016/j.tics.2007.09.004 . .

GE Hinton , , , RR Salakhutdinov . . Reducing the dimensionality of data with neural networks . . Science , , 2006 . . 313 ( ( 5786 ): ): 504 - - 507 . . DOI: 10.1126/science.1127647 http://doi.org/10.1126/science.1127647 . .

DP Kingma , , , M Welling . . Auto-encoding variational Bayes , , 2016 . . https://arxiv.org/abs/1312.6114 https://arxiv.org/abs/1312.6114 , , . .

LH Meng , , , SF Ding , , , N Zhang , , , 等 . . Research of stacked denoising sparse autoencoder . . Neur Comput Appl , , 2018 . . 30 ( ( 7 ): ): 2083 - - 2100 . . DOI: 10.1007/s00521-016-2790-x http://doi.org/10.1007/s00521-016-2790-x . .

QX Meng , , , D Catchpoole , , , D Skillicom , , , 等 . . Relational autoencoder for feature extraction . . Int Joint Conf on Neural Networks , , 2017 . . p.364 - - 371 . . DOI: 10.1109/ijcnn.2017.7965877 http://doi.org/10.1109/ijcnn.2017.7965877 . .

YX Peng , , , XT He , , , JJ Zhao . . Object-part attention model for fine-grained image classification . . IEEE Trans Image Process , , 2018 . . 27 ( ( 3 ): ): 1487 - - 1500 . . DOI: 10.1109/tip.2017.2774041 http://doi.org/10.1109/tip.2017.2774041 . .

MH Rahmani , , , F Almasganj , , , S Ali Seyyedsalehi . . Audio- visual feature fusion via deep neural networks for automatic speech recognition . . Dig Signal Process , , 2018 . . 82 ( ( 5 ): ): 54 - - 63 . . DOI: 10.1016/j.dsp.2018.06.004 http://doi.org/10.1016/j.dsp.2018.06.004 . .

S Rifai , , , P Vincent , , , X Muller , , , 等 . . Contractive auto- encoders: explicit invariance during feature extraction . . Proc 28 th Int Conf on Machine Learning , , 2011 . . p.833 - - 840 . . . .

E Santana , , , M Emigh , , , JC Principe . . Information theoretic- learning auto-encoder . . Int Joint Conf on Neural Networks , , 2016 . . DOI: 10.1109/ijcnn.2016.7727620 http://doi.org/10.1109/ijcnn.2016.7727620 . .

Y Sun , , , Y Chen , , , XG Wang , , , 等 . . Deep learning face representation by joint identification-verification . . Proc 27 th Int Conf on Neural Information Processing , , 2014 . . p.1988 - - 1996 . . . .

YN Sun , , , B Xue , , , MJ Zhang , , , 等 . . A particle swarm optimization-based flexible convolutional autoencoder for image classification . . IEEE Trans Neur Netw Learn Syst , , 2017 . . 30 ( ( 8 ): ): 2295 - - 2309 . . DOI: 10.1109/TNNLS.2018.2881143 http://doi.org/10.1109/TNNLS.2018.2881143 . .

A Taherkhani , , , G Cosma , , , TM Mcginnity . . Deep-FS: a feature selection algorithm for deep Boltzmann machines . . Neurocomputing , , 2018 . . 322 22 - - 37 . . DOI: 10.1016/j.neucom.2018.09.040 http://doi.org/10.1016/j.neucom.2018.09.040 . .

JH Tang , , , ZC Li , , , M Wang , , , 等 . . Neighborhood discriminant hashing for large-scale image retrieval . . IEEE Trans Image Process , , 2015 . . 24 ( ( 9 ): ): 2827 - - 2840 . . DOI: 10.1109/tip.2015.2421443 http://doi.org/10.1109/tip.2015.2421443 . .

I Tolstikhin , , , O Bousquet , , , S Gelly , , , 等 . . Wasserstein auto-encoders , , 2017 . . https://arxiv.org/abs/1711.01558 https://arxiv.org/abs/1711.01558 , , . .

P Vincent , , , H Larochelle , , , I Lajoie , , , 等 . . Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion . . J Mach Learn Res , , 2010 . . 11 ( ( 12 ): ): 3371 - - 3408 . . . .

W Wang , , , Y Huang , , , YZ Wang , , , 等 . . Generalized autoencoder: a neural network framework for dimensionality reduction . . IEEE Conf on Computer Vision and Pattern Recognition , , 2014 . . DOI: 10.1109/cvprw.2014.79 http://doi.org/10.1109/cvprw.2014.79 . .

J Wu , , , ZH Cai , , , XQ Zhu . . Self-adaptive probability estimation for Naive Bayes classification . . Int Joint Conf on Neural Networks , , 2013 . . DOI: 10.1109/ijcnn.2013.6707028 http://doi.org/10.1109/ijcnn.2013.6707028 . .

WD Xu , , , HZ Sun , , , C Deng , , , 等 . . Variational autoencoders for semi-supervised text classification , , 2016 . . https://arxiv.org/abs/1603.02514 https://arxiv.org/abs/1603.02514 , , . .

TS Zhang , , , W Wang , , , H Ye , , , 等 . . Fault detection for ironmaking process based on stacked denoising autoencoders . . American Control Conf , , 2016 . . p.3261 - - 3267 . . DOI: 10.1109/acc.2016.7525420 http://doi.org/10.1109/acc.2016.7525420 . .

Views

244

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

Neural mesh refinement

Dynamic prompting class distribution optimization for semi-supervised sound event detection

Accurate estimation of 6-DoF tooth pose in 3D intraoral scans for dental applications using deep learning

Prompt learning in computer vision: a survey

Attention-based efficient robot grasp detection network

Related Author

Zhiwei ZHU

Xiang GAO

Lu YU

Yiyi LIAO

Lijian GAO

Qing ZHU

Yaxin SHEN

Qirong MAO

Related Institution

Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking (IPCAN)

College of Information Science and Electronic Engineering, Zhejiang University

Jiangsu Engineering Research Center of Big Data Ubiquitous Perception and Intelligent Agricultural Applications

School of Computer Science and Communication Engineering, Jiangsu University

Hangzhou Dental Hospital

Chat

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰