SPSSNet: a real-time network for image semantic segmentation

Saqib MAMOON; Muhammad Arslan MANZOOR; Fa-en ZHANG; Zakir ALI; Jian-feng LU

doi:10.1631/FITEE.1900697

Your Location：

Home >

Browse articles >

SPSSNet: a real-time network for image semantic segmentation

Regular Papers | Updated：2022-05-19

- SPSSNet: a real-time network for image semantic segmentation
  Enhanced Publication
- SPSSNet：一种用于图像语义分割的实时网络
- Frontiers of Information Technology & Electronic Engineering Vol. 21, Issue 12, Pages: 1770-1782(2020)
- Affiliations：
  
  School of Computer Science and Engineering, Nanjing University of Science & Technology, Nanjing 210094, China
  AInnovation, Beijing 100080, China
- Author bio：
  
  [ "Saqib MAMOON, E-mail: saqibmamoon@njust.edu.cn" ]
  [ "Muhammad Arslan MANZOOR, E-mail: arsalaan@njust.edu.cn" ]
  [ "Fa-en ZHANG, E-mail: zhangfaen@ainnovation.com" ]
  [ "Zakir ALI, E-mail: alizakir@njust.edu.cn" ]
  Jian-feng LU, E-mail: lujf@njust.edu.cn
- Funds：
- DOI：10.1631/FITEE.1900697
  CLC：
Scan for full text
Saqib MAMOON, Muhammad Arslan MANZOOR, Fa-en ZHANG, et al. SPSSNet: a real-time network for image semantic segmentation. [J]. Frontiers of Information Technology & Electronic Engineering 21(12):1770-1782(2020)
DOI：

Saqib MAMOON, Muhammad Arslan MANZOOR, Fa-en ZHANG, et al. SPSSNet: a real-time network for image semantic segmentation. [J]. Frontiers of Information Technology & Electronic Engineering 21(12):1770-1782(2020) DOI： 10.1631/FITEE.1900697.

摘要

深度神经网络（DNNs）虽已在语义分割领域取得极大成功，但要实现实时推理仍然是一项巨大挑战。大量特征通道、参数与浮点运算极大延缓了网络的推理速度，导致无法满足诸如机器人控制、自动驾驶等实时任务要求。现有大多数方法是通过牺牲空间分辨率来加速推理，往往导致推理结果准确率下降。针对此问题，提出一种新的轻量级阶段池化语义分割网络（SPSSN）。该网络可以保留浅层学习得到的重要特征并在后续层中重复使用。SPSSN以2048×1024的全分辨率图像作为输入，网络模型仅包含1.42×10

参数。在无预训练情况下，在Cityscapes数据集上可达到69.4%的mIoU精度，推理速度则可达到每秒59帧。由于SPSSN结构轻巧，它可以在移动设备上实时运行。最后，为验证本文方法有效性，与当前最优网络进行了对比。

Abstract

Although deep neural networks (DNNs) have achieved great success in semantic segmentation tasks

it is still challenging for real-time applications. A large number of feature channels

parameters

and floating-point operations make the network sluggish and computationally heavy

which is not desirable for real-time tasks such as robotics and autonomous driving. Most approaches

however

usually sacrifice spatial resolution to achieve inference speed in real time

resulting in poor performance. In this paper

we propose a light-weight stage-pooling semantic segmentation network (SPSSN)

which can efficiently reuse the paramount features from early layers at multiple stages

at different spatial resolutions. SPSSN takes input of full resolution 2048

$$\times$$

1024 pixels

uses only 1.42

$$\times 10^6$$

parameters

yields 69.4% mIoU accuracy without pre-training

and obtains an inference speed of 59 frames/s on the Cityscapes dataset. SPSSN can run directly on mobile devices in real time

due to its light-weight architecture. To demonstrate the effectiveness of the proposed network

we compare our results with those of state-of-the-art networks.

关键词

实时语义分割阶段池化特征再利用

Keywords

Real-time semantic segmentationStage-poolingFeature reuse

references

V Badrinarayanan, , , A Kendall, , , R Cipolla. . SegNet: a deep convolutional encoder-decoder architecture for image segmentation. . IEEE Trans Patt Anal Mach Intell, , 2017. . 39((12):):2481--2495. . DOI:10.1109/TPAMI.2016.2644615http://doi.org/10.1109/TPAMI.2016.2644615..

GJ Brostow, , , J Fauqueur, , , R Cipolla. . Semantic object classes in video: a high-definition ground truth database. . Patt Recogn Lett, , 2009. . 30((2):):88--97. . DOI:10.1016/j.patrec.2008.04.005http://doi.org/10.1016/j.patrec.2008.04.005..

LC Chen, , , G Papandreou, , , F Schroff, , , 等. . Rethinking atrous convolution for semantic image segmentation. . 2017. . https://arxiv.org/abs/1706.05587https://arxiv.org/abs/1706.05587, , ..

LC Chen, , , G Papandreou, , , I Kokkinos, , , 等. . DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. . IEEE Trans Patt Anal Mach Intell, , 2018. . 40((4):):834--848. . DOI:10.1109/TPAMI.2017.2699184http://doi.org/10.1109/TPAMI.2017.2699184..

J Cheng, , , P Wang, , , G Li, , , 等. . Recent advances in efficient computation of deep convolutional neural networks. . Front Inform Technol Electron Eng, , 2018. . 19((1):):64--77. . DOI:10.1631/FITEE.1700789http://doi.org/10.1631/FITEE.1700789..

F Chollet. . Xception: deep learning with depthwise separable convolutions. . 2016. . https://arxiv.org/abs/1610.02357https://arxiv.org/abs/1610.02357, , ..

PF Christ, , , MEA Elshaer, , , F Ettlinger, , , 等. . Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields. . Proc $19^{\rm th}$ Int Conf on Medical Image Computing and Computer-Assisted Intervention, , 2016. . p.415--423. . DOI:10.1007/978-3-319-46723-8_48http://doi.org/10.1007/978-3-319-46723-8_48..

M Cordts, , , M Omran, , , S Ramos, , , 等. . The Cityscapes dataset for semantic urban scene understanding. . Proc IEEE Conf on Computer Vision and Pattern Recognition, , 2016. . p.3213--3223. . DOI:10.1109/CVPR.2016.350http://doi.org/10.1109/CVPR.2016.350..

JF Dai, , , KM He, , , Y Li, , , 等. . Instance-sensitive fully convolutional networks. . Proc $14^{\rm th}$ European Conf on Computer Vision, , 2016a. . p.534--549. . DOI:10.1007/978-3-319-46466-4_32http://doi.org/10.1007/978-3-319-46466-4_32..

JF Dai, , , Y Li, , , KM He, , , 等. . R-FCN: object detection via region-based fully convolutional networks. . Proc $30^{\rm th}$ Int Conf on Neural Information Processing Systems, , 2016b. . p.379--387. . ..

J Devlin, , , MW Chang, , , K Lee, , , 等. . BERT: pre-training of deep bidirectional transformers for language understanding. . 2018. . https://arxiv.org/abs/1810.04805https://arxiv.org/abs/1810.04805, , ..

S Han, , , HZ Mao, , , WJ Dally. . Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding. . Proc $4^{\rm th}$ Int Conf on Learning Representations, , 2016. . p.1--14. . ..

KM He, , , XY Zhang, , , SQ Ren, , , 等. . Deep residual learning for image recognition. . IEEE Conf on Computer Vision and Pattern Recognition, , 2016. . p.770--778. . DOI:10.1109/CVPR.2016.90http://doi.org/10.1109/CVPR.2016.90..

AG Howard, , , ML Zhu, , , B Chen, , , 等. . MobileNets: efficient convolutional neural networks for mobile vision applications. . 2017. . https://arxiv.org/abs/1704.04861https://arxiv.org/abs/1704.04861, , ..

H Hu, , , JY Gu, , , Z Zhang, , , 等. . Relation networks for object detection. . 2017. . http://arxiv.org/abs/1711.11575http://arxiv.org/abs/1711.11575, , ..

G Huang, , , SC Liu, , , L van der Maaten, , , 等. . Condensenet: an efficient densenet using learned group convolutions. . 2017. . https://arxiv.org/abs/1711.09224https://arxiv.org/abs/1711.09224, , ..

I Hubara, , , M Courbariaux, , , D Soudry, , , 等. . Binarized neural networks. . Proc $30^{\rm th}$ Int Conf on Neural Information Processing Systems, , 2016. . p.4114--4122. . ..

I Hubara, , , M Courbariaux, , , D Soudry, , , 等. . Quantized neural networks: training neural networks with low precision weights and activations. . J Mach Learn Res, , 2018. . 18((187):):1--30. . ..

S Ioffe, , , C Szegedy. . Batch normalization: accelerating deep network training by reducing internal covariate shift. . Proc $32^{\rm nd}$ Int Conf on Machine Learning, , 2015. . p.1448--1456. . ..

S Jégou, , , M Drozdzal, , , D Vazquez, , , 等. . The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. . Proc IEEE Conf on Computer Vision and Pattern Recognition Workshops, , 2017. . p.1175--1183. . DOI:10.1109/CVPRW.2017.156http://doi.org/10.1109/CVPRW.2017.156..

H Lee, , , T Matin, , , F Gleeson, , , 等. . Efficient 3D fully convolutional networks for pulmonary lobe segmentation in CT images. . 2019. . https://arxiv.org/abs/1909.07474https://arxiv.org/abs/1909.07474, , ..

C Li, , , CJR Shi. . Constrained optimization based low-rank approximation of deep neural networks. . Proc $15^{\rm th}$ European Conf on Computer Vision, , 2018. . p.746--761. . DOI:10.1007/978-3-030-01249-6_45http://doi.org/10.1007/978-3-030-01249-6_45..

H Li, , , A Kadav, , , I Durdanovic, , , 等. . Pruning filters for efficient ConvNets. . 2016. . https://arxiv.org/abs/1608.08710https://arxiv.org/abs/1608.08710, , ..

HC Li, , , PF Xiong, , , HQ Fan, , , 等. . DFANet: deep feature aggregation for real-time semantic segmentation. . 2019. . https://arxiv.org/abs/1904.02216https://arxiv.org/abs/1904.02216, , ..

GS Lin, , , CH Shen, , , A van den Hengel, , , 等. . Efficient piecewise training of deep structured models for semantic segmentation. . IEEE Conf on Computer Vision and Pattern Recognition, , 2016. . p.3194--3203. . DOI:10.1109/CVPR.2016.348http://doi.org/10.1109/CVPR.2016.348..

GS Lin, , , FY Liu, , , A Milan, , , 等. . RefineNet: multi-path refinement networks for dense prediction. . IEEE Trans Patt Anal Mach Intell, , 2019. . p.1228--1242. . DOI:10.1109/TPAMI.2019.2893630http://doi.org/10.1109/TPAMI.2019.2893630..

ZW Liu, , , XX Li, , , P Luo, , , 等. . Semantic image segmentation via deep parsing network. . IEEE Int Conf on Computer Vision, , 2015. . p.1377--1385. . DOI:10.1109/ICCV.2015.162http://doi.org/10.1109/ICCV.2015.162..

J Long, , , E Shelhamer, , , T Darrell. . Fully convolutional networks for semantic segmentation. . 2014. . https://arxiv.org/abs/1411.4038https://arxiv.org/abs/1411.4038, , ..

NN Ma, , , XY Zhang, , , HT Zheng, , , 等. . ShuffleNet V2: practical guidelines for efficient CNN architecture design. . Proc $15^{\rm th}$ European Conf on Computer Vision, , 2018. . p.122--138. . DOI:10.1007/978-3-030-01264-9_8http://doi.org/10.1007/978-3-030-01264-9_8..

D Mazzini. . Guided upsampling network for real-time semantic segmentation. . 2018. . https://arxiv.org/abs/1807.07466https://arxiv.org/abs/1807.07466, , ..

S Mehta, , , M Rastegari, , , A Caspi, , , 等. . ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. . Proc $15^{\rm th}$ European Conf on Computer Vision, , 2018. . p.561--580. . DOI:10.1007/978-3-030-01249-6_34http://doi.org/10.1007/978-3-030-01249-6_34..

S Mehta, , , M Rastegari, , , L Shapiro, , , 等. . ESPNetv2: a light-weight, power efficient, and general purpose convolutional neural network. . IEEE Conf on Computer Vision and Pattern Recognition, , 2019. . p.9190--9200. . DOI:10.1109/CVPR.2019.00941http://doi.org/10.1109/CVPR.2019.00941..

V Nekrasov, , , CH Shen, , , I Reid. . Light-weight RefineNet for real-time semantic segmentation. . British Machine Vision Conf, , 2018. . p.125..

H Noh, , , S Hong, , , B Han. . Learning deconvolution network for semantic segmentation. . 2015. . https://arxiv.org/abs/1505.04366https://arxiv.org/abs/1505.04366, , ..

Y Pan. . On visual knowledge. . Front Inform Technol Electon Eng, , 2019. . 20((8):):1021--1025. . DOI:10.1631/FITEE.1910001http://doi.org/10.1631/FITEE.1910001..

A Paszke, , , A Chaurasia, , , S Kim, , , 等. . ENet: a deep neural network architecture for real-time semantic segmentation. . 2016. . https://arxiv.org/abs/1606.02147https://arxiv.org/abs/1606.02147, , ..

YX Peng, , , XT He, , , JJ Zhao. . Object-part attention model for fine-grained image classification. . IEEE Trans Image Process, , 2018. . 27((3):):1487--1500. . DOI:10.1109/TIP.2017.2774041http://doi.org/10.1109/TIP.2017.2774041..

RPK Poudel, , , U Bonde, , , S Liwicki, , , 等. . ContextNet: exploring context and detail for semantic segmentation in real-time. . 2018. . https://arxiv.org/abs/1805.04554https://arxiv.org/abs/1805.04554, , ..

RPK Poudel, , , S Liwicki, , , R Cipolla. . Fast-SCNN: fast semantic segmentation network. . 2019. . https://arxiv.org/abs/1902.04502https://arxiv.org/abs/1902.04502, , ..

M Rastegari, , , V Ordonez, , , J Redmon, , , 等. . XNOR-Net: ImageNet classification using binary convolutional neural networks. . Proc $14^{\rm th}$ European Conf on Computer Vision, , 2016. . p.525--542. . DOI:10.1007/978-3-319-46493-0_32http://doi.org/10.1007/978-3-319-46493-0_32..

SQ Ren, , , KM He, , , R Girshick, , , 等. . Faster R-CNN: towards real-time object detection with region proposal networks. . IEEE Trans Patt Anal Mach Intell, , 2017. . 39((6):):1137--1149. . DOI:10.1109/TPAMI.2016.2577031http://doi.org/10.1109/TPAMI.2016.2577031..

E Romera, , , JM Álvarez, , , LM Bergasa, , , 等. . ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. . IEEE Trans Intell Transp Syst, , 2018. . 19((1):):263--272. . DOI:10.1109/TITS.2017.2750080http://doi.org/10.1109/TITS.2017.2750080..

A Salvador, , , M Bellver, , , V Campos, , , 等. . Recurrent neural networks for semantic instance segmentation. . 2017. . https://arxiv.org/abs/1712.00617https://arxiv.org/abs/1712.00617, , ..

M Sandler, , , A Howard, , , ML Zhu, , , 等. . MobileNetV2: inverted residuals and linear bottlenecks. . IEEE Conf on Computer Vision and Pattern Recognition, , 2018. . p.4510--4520. . DOI:10.1109/CVPR.2018.00474http://doi.org/10.1109/CVPR.2018.00474..

J Sherrah. . Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery. . 2016. . https://arxiv.org/abs/1606.02585https://arxiv.org/abs/1606.02585, , ..

M Siam, , , M Gamal, , , M Abdel-Razek, , , 等. . A comparative study of real-time semantic segmentation for autonomous driving. . IEEE Conf on Computer Vision and Pattern Recognition Workshops, , 2018. . p.587--597. . DOI:10.1109/CVPRW.2018.00101http://doi.org/10.1109/CVPRW.2018.00101..

D Soudry, , , I Hubara, , , R Meir. . Expectation backpropagation: parameter-free training of multilayer neural networks with continuous or discrete weights. . Proc $27^{\rm th}$ Int Conf on Neural Information Processing Systems, , 2014. . p.963--971. . ..

P Sturgess, , , K Alahari, , , L Ladicky, , , 等. . Combining appearance and structure from motion features for road scene understanding. . British Machine Vision Conf, , 2009. . p.1--11. . DOI:10.5244/C.23.62http://doi.org/10.5244/C.23.62..

C Szegedy, , , V Vanhoucke, , , S Ioffe, , , 等. . Rethinking the inception architecture for computer vision. . 2015. . https://arxiv.org/abs/1512.00567https://arxiv.org/abs/1512.00567, , ..

S Türkmen, , , J Heikkilä. . An efficient solution for semantic segmentation: ShuffleNet V2 with atrous separable convolutions. . Proc $21^{\rm st}$ Scandinavian Conf on Image Analysis, , 2019. . p.41--53. . DOI:10.1007/978-3-030-20205-7_4http://doi.org/10.1007/978-3-030-20205-7_4..

F Visin, , , K Kastner, , , AC Courville, , , 等. . ReSeg: a recurrent neural network for object segmentation. . 2015. . https://arxiv.org/abs/1511.07053https://arxiv.org/abs/1511.07053, , ..

W Wen, , , CP Wu, , , YD Wang, , , 等. . Learning structured sparsity in deep neural networks. . Proc $30^{\rm th}$ Int Conf on Neural Information Processing Systems, , 2016. . p.1--9. . ..

AC Wilson, , , R Roelofs, , , M Stern, , , 等. . The marginal value of adaptive gradient methods in machine learning. . Proc $31^{\rm st}$ Int Conf on Neural Information Processing Systems, , 2017. . p.1--14. . ..

S Wu, , , GQ Li, , , F Chen, , , 等. . Training and inference with integers in deep neural networks. . 2018. . https://arxiv.org/abs/1802.04680https://arxiv.org/abs/1802.04680, , ..

W Xiang, , , HD Mao, , , V Athitsos. . ThunderNet: a turbo unified network for real-time semantic segmentation. . IEEE Winter Conf on Applications of Computer Vision, , 2019. . p.1789--1796. . DOI:10.1109/WACV.2019.00195http://doi.org/10.1109/WACV.2019.00195..

J Yang, , , QS Liu, , , KH Zhang. . Stacked hourglass network for robust facial landmark localisation. . IEEE Conf on Computer Vision and Pattern Recognition Workshops, , 2017. . p.2025--2033. . DOI:10.1109/CVPRW.2017.253http://doi.org/10.1109/CVPRW.2017.253..

CQ Yu, , , JB Wang, , , C Peng, , , 等. . BiSeNet: bilateral segmentation network for real-time semantic segmentation. . Proc $15^{\rm th}$ European Conf on Computer Vision, , 2018. . p.334--349. . DOI:10.1007/978-3-030-01261-8_20http://doi.org/10.1007/978-3-030-01261-8_20..

F Yu, , , V Koltun. . Multi-scale context aggregation by dilated convolutions. . Proc $4^{\rm th}$ Int Conf on Learning Representations, , 2016. . p.1--13. . ..

F Yu, , , V Koltun, , , T Funkhouser. . Dilated residual networks. . IEEE Conf on Computer Vision and Pattern Recognition, , 2017. . p.636--644. . DOI:10.1109/CVPR.2017.75http://doi.org/10.1109/CVPR.2017.75..

JC Zhang, , , YX Peng. . Hierarchical vision-language alignment for video captioning. . Proc $25^{\rm th}$ Int Conf on Multimedia Modeling, , 2019a. . p.42--54. . DOI:10.1007/978-3-030-05710-7_4http://doi.org/10.1007/978-3-030-05710-7_4..

JC Zhang, , , YX Peng. . Object-aware aggregation with bidirectional temporal graph for video captioning. . 2019b. . https://arxiv.org/abs/1906.04375https://arxiv.org/abs/1906.04375, , ..

QS Zhang, , , SC Zhu. . Visual interpretatbility for deep learning: a survey. . Front Inform Technol Electron Eng, , 2018. . 19((1):):27--39. . DOI:10.1631/FITEE.1700808http://doi.org/10.1631/FITEE.1700808..

HS Zhao, , , JP Shi, , , XJ Qi, , , 等. . Pyramid scene parsing network. . IEEE Conf on Computer Vision and Pattern Recognition, , 2017. . p.6230--6239. . DOI:10.1109/CVPR.2017.660http://doi.org/10.1109/CVPR.2017.660..

HS Zhao, , , XJ Qi, , , XY Shen, , , 等. . ICNet for real-time semantic segmentation on high-resolution images. . Proc $15^{\rm th}$ European Conf on Computer Vision, , 2018. . p.418--434. . DOI:10.1007/978-3-030-01219-9_25http://doi.org/10.1007/978-3-030-01219-9_25..

S Zheng, , , S Jayasumana, , , B Romera-Paredes, , , 等. . Conditional random fields as recurrent neural networks. . IEEE Int Conf on Computer Vision, , 2015. . p.1529--1537. . DOI:10.1109/ICCV.2015.179http://doi.org/10.1109/ICCV.2015.179..

Views

Downloads

CSCD

Alert me when the article has been cited

Submit

Tools

Publicity Resources

No data

Related Author

No data

Related Institution

No data

Address：Zhejiang University Press, 148 Tianmushan Road, Hangzhou, China Postal code：310028
Tel：+86-571-88273162 Email：fitee@zju.edu.cn
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰