EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集

周赟湛; 冯天; 帅世辉; 厉向东; 孙凌云; 杜本麟

doi:10.1631/FITEE.2000318

Your Location：

Home >

Browse articles >

EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集

常规文章 | Updated：2022-04-19

- EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集
  Enhanced Publication
- EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum
- 信息与电子工程前沿（英文） 2022年23卷第1期页码：101-112
- Affiliations：
  
  1.Department of Computer Science, Durham University, DurhamDH1 3LE, UK
  2.Department of Computer Science and Information Technology, La Trobe University, VIC3086, Australia
  3.Alibaba Group, Hangzhou311121, China
  4.Department of Digital Media, Zhejiang University, Hangzhou310027, China
  5.International Design Institute, Zhejiang University, Hangzhou310058, China
- Author bio：
  
  E-mail: yunzhan.zhou@durham.ac.uk;
  ‡Corresponding author
- Funds：
  
  National Natural Science Foundation of China(61802341);National Science and Technology Innovation 2030 Major Project of the Ministry of Science and Technology of China(2018AAA0100703);Research Innovation Plan of the Ministry of Education of China;Provincial Key Research and Development Plan of Zhejiang Province, China(2019C03137)
- DOI：10.1631/FITEE.2000318
  中图分类号： TP391
- 纸质出版日期：2022-01-0 ，
  
  收稿日期：2020-07-03，
  
  录用日期：2021-02-15
- Accepted：
Scan QR Code
周赟湛, 冯天, 帅世辉, 等. EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集[J]. 信息与电子工程前沿（英文）, 2022,23(1):101-112.

YUNZHAN ZHOU, TIAN FENG, SHIHUI SHUAI, et al. EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum. [J]. Frontiers of information technology & electronic engineering, 2022, 23(1): 101-112.
周赟湛, 冯天, 帅世辉, 等. EDVAM：用于虚拟博物馆视觉注意建模的三维眼动数据集[J]. 信息与电子工程前沿（英文）, 2022,23(1):101-112. DOI： 10.1631/FITEE.2000318.

YUNZHAN ZHOU, TIAN FENG, SHIHUI SHUAI, et al. EDVAM: a 3D eye-tracking dataset for visual attention modeling in a virtual museum. [J]. Frontiers of information technology & electronic engineering, 2022, 23(1): 101-112. DOI： 10.1631/FITEE.2000318.

摘要

视觉注意预测能帮助建立适应性虚拟博物馆环境，提供上下文感知和交互式用户体验。目前，利用眼动数据探究视觉注意机制的研究仍限于二维场景。研究者尚未能从时间和空间的角度出发，在三维虚拟场景里研究这一问题。为此，我们构建了第一个用于虚拟博物馆视觉注意建模的三维眼动数据集，命名为EDVAM。我们还建立了一个深度学习模型，通过历史眼动轨迹预测用户未来的视觉注意区域，用于测试EDVAM。这项研究能为虚拟博物馆的视觉注意建模和上下文感知交互提供参考。

Abstract

Predicting visual attention facilitates an adaptive virtual museum environment and provides a context-aware and interactive user experience. Explorations toward development of a visual attention mechanism using eye-tracking data have so far been limited to 2D cases

and researchers are yet to approach this topic in a 3D virtual environment and from a spatiotemporal perspective. We present the first 3D Eye-tracking Dataset for Visual Attention modeling in a virtual Museum

known as the EDVAM. In addition

a deep learning model is devised and tested with the EDVAM to predict a user's subsequent visual attention from previous eye movements. This work provides a reference for visual attention modeling and context-aware interaction in the context of virtual museums.

关键词

视觉注意虚拟博物馆眼动数据集注视检测深度学习

Keywords

Visual attentionVirtual museumsEye-tracking datasetsGaze detectionDeep learning

references

Alers H,Redi JA,Heynderickx I,2012.Examining the effect of task on viewing behavior in videos using saliency maps.Proc SPIE 8291, Human Vision and Electronic Imaging XVII, p.82910X.doi:10.1117/12.907373http://doi.org/10.1117/12.907373

Azmandian M,Hancock M,Benko H,et al.,2016.Haptic retargeting: dynamic repurposing of passive haptics for enhanced virtual reality experiences.Proc CHI Conf on Human Factors in Computing Systems, p.1968-1979.doi:10.1145/2858036.2858226http://doi.org/10.1145/2858036.2858226

Barbieri L,Bruno F,Muzzupappa M,2018.User-centered design of a virtual reality exhibit for archaeological museums.Int J Interact Des Manuf,12(2):561-571.doi:10.1007/s12008-017-0414-zhttp://doi.org/10.1007/s12008-017-0414-z

Beer S,2015.Digital heritage museums and virtual museums.Proc Virtual Reality Int Conf, p.1-4.doi:10.1145/2806173.2806183http://doi.org/10.1145/2806173.2806183

Bruce NDB,Tsotsos JK,2006.Saliency based on information maximization.Proc 18th Int Conf on Neural Information Processing Systems, p.155-162.

Carmi R,Itti L,2006.Visual causes versus correlates of attentional selection in dynamic scenes.Vis Res,46(26):4333-4345.doi:10.1016/j.visres.2006.08.019http://doi.org/10.1016/j.visres.2006.08.019

Carrozzino M,Bergamasco M,2010.Beyond virtual museums: experiencing immersive virtual reality in real museums.J Cult Herit,11(4):452-458.doi:10.1016/j.culher.2010.04.001http://doi.org/10.1016/j.culher.2010.04.001

Cerf M,Harel J,Einhäeuser W,et al.,2008.Predicting human gaze using low-level saliency combined with face detection.Proc 20th Int Conf on Neural Information Processing Systems, p.241-248.

Chen K,Zhou Y,Dai FY,2015.A LSTM-based method for stock returns prediction: a case study of China stock market.Proc IEEE Int Conf on Big Data, p.2823-2824.doi:10.1109/BigData.2015.7364089http://doi.org/10.1109/BigData.2015.7364089

Ciolfi L,Damala A,Hornecker E,et al.,2015.Cultural heritage communities: technologies and challenges.Proc 7th Int Conf on Communities and Technologies, p.149-152.doi:10.1145/2768545.2768560http://doi.org/10.1145/2768545.2768560

Connor CE,Egeth HE,Yantis S,2004.Visual attention: bottom-up versus top-down.Curr Biol,14(19):R850-R852.doi:10.1016/j.cub.2004.09.041http://doi.org/10.1016/j.cub.2004.09.041

David EJ,Gutiĺęrrez J,Coutrot A,et al.,2018.A dataset of head and eye movements for 360°videos.Proc 9th ACM Multimedia Systems Conf, p.432-437.doi:10.1145/3204949.3208139http://doi.org/10.1145/3204949.3208139

Davis MM,Gabbard JL,Bowman DA,et al.,2016.Depth-based 3D gesture multi-level radial menu for virtual object manipulation.Proc IEEE Virtual Reality, p.169-170.doi:10.1109/VR.2016.7504707http://doi.org/10.1109/VR.2016.7504707

de Jesus Oliveira VA,Nedel L,Maciel A,2016.Speaking haptics: proactive haptic articulation for intercommunication in virtual environments.Proc IEEE Virtual Reality, p.251-252.doi:10.1109/VR.2016.7504748http://doi.org/10.1109/VR.2016.7504748

Eck D,Schmidhuber J,2002.Finding temporal structure in music: blues improvisation with LSTM recurrent networks.Proc 12th IEEE Workshop on Neural Networks for Signal Processing, p.747-756.doi:10.1109/NNSP.2002.1030094http://doi.org/10.1109/NNSP.2002.1030094

Ehinger KA,Hidalgo-Sotelo B,Torralba A,et al.,2009.Modelling search for people in 900 scenes: a combined source model of eye guidance.Vis Cogn,17(6-7):945-978.doi:10.1080/13506280902834720http://doi.org/10.1080/13506280902834720

Engelke U,Barkowsky M,Callet PL,et al.,2010.Modelling saliency awareness for objective video quality assessment.Proc 2nd Int Workshop on Quality of Multimedia Experience, p.212-217.doi:10.1109/QOMEX.2010.5516159http://doi.org/10.1109/QOMEX.2010.5516159

Fan CL,Lee J,Lo WC,et al.,2017.Fixation prediction for 360° video streaming in head-mounted virtual reality.Proc 27th Workshop on Network and Operating Systems Support for Digital Audio and Video, p.67-72.doi:10.1145/3083165.3083180http://doi.org/10.1145/3083165.3083180

Fang YM,Zhang C,Li J,et al.,2016.Visual attention modeling for stereoscopic video.Proc IEEE Int Conf on Multimedia Expo Workshops, p.1-6.doi:10.1109/ICMEW.2016.7574768http://doi.org/10.1109/ICMEW.2016.7574768

Felnhofer A,Kothgassner OD,Beutl L,et al.,2012.Is virtual reality made for men only? Exploring gender differences.Proc Int Society for Presence Research Annual Conf, p.103-112.

Fu HZ,Xu D,Lin S,2017.Object-based Multiple Foreground Segmentation in RGBD Video.IEEE Trans Image Process,26(3):1418-1427.doi:10.1109/TIP.2017.2651369http://doi.org/10.1109/TIP.2017.2651369

Gers FA,Schmidhuber J,Cummins F,2000.Learning to forget: continual prediction with LSTM.Neur Comput,12(10):2451-2471.doi:10.1162/089976600300015015http://doi.org/10.1162/089976600300015015

Hadizadeh H,Enriquez MJ,Bajic IV,2012.Eye-tracking database for a set of standard video sequences.IEEE Trans Image Process,21(2):898-903.doi:10.1109/TIP.2011.2165292http://doi.org/10.1109/TIP.2011.2165292

Hirota K,Tagawa K,2016.Interaction with virtual object using deformable hand.Proc IEEE Virtual Reality, p.49-56.doi:10.1109/VR.2016.7504687http://doi.org/10.1109/VR.2016.7504687

Hou HT,Wu SY,Lin PC,et al.,2014.A blended mobile learning environment for museum learning.Edu Technol Soc,17(2):207-218.

Hou XD,Zhang LQ,2007.Saliency detection: a spectral residual approach.Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1-8.doi:10.1109/CVPR.2007.383267http://doi.org/10.1109/CVPR.2007.383267

Itti L,2000.Models of Bottom-Up and Top-Down Visual Attention. PhD Thesis,California Institute of Technology,Pasadena, USA.

Itti L,2004.Automatic foveation for video compression using a neurobiological model of visual attention.IEEE Trans Image Process,13(10):1304-1318.doi:10.1109/TIP.2004.834657http://doi.org/10.1109/TIP.2004.834657

Itti L,Koch C,Niebur E,1998.A model of saliency-based visual attention for rapid scene analysis.IEEE Trans Patt Anal Mach Intell,20(11):1254-1259.doi:10.1109/34.730558http://doi.org/10.1109/34.730558

Jian MW,Dong JY,Ma J,2011.Image retrieval using wavelet-based salient regions.Imag Sci J,59(4):219-231.doi:10.1179/136821910X12867873897355http://doi.org/10.1179/136821910X12867873897355

Judd T,Ehinger K,Durand F,et al.,2009.Learning to predict where humans look.Proc IEEE 12th Int Conf on Computer Vision, p.2106-2113.doi:10.1109/ICCV.2009.5459462http://doi.org/10.1109/ICCV.2009.5459462

Kadir T,Brady M,2001.Saliency, scale and image description.Int J Comput Vis,45(2):83-105.doi:10.1023/A:1012460413855http://doi.org/10.1023/A:1012460413855

Kootstra G,de Boer B,Schomaker LRB,2011.Predicting eye fixations on complex visual stimuli using local symmetry.Cogn Comput,3(1):223-240.doi:10.1007/s12559-010-9089-5http://doi.org/10.1007/s12559-010-9089-5

Koskenranta O,Colley A,Häkkilä J,2013.Portable CAVE using a mobile projector.Proc ACM Conf on Pervasive and Ubiquitous Computing Adjunct Publication, p.39-42.doi:10.1145/2494091.2494102http://doi.org/10.1145/2494091.2494102

Kruthiventi SSS,Ayush K,Babu RV,2017.DeepFix: a fully convolutional neural network for predicting human eye fixations.IEEE Trans Image Process,26(9):4446-4456.doi:10.1109/TIP.2017.2710620http://doi.org/10.1109/TIP.2017.2710620

Lang CY,Nguyen TV,Katti H,et al.,2012.Depth matters: influence of depth cues on visual saliency.Proc 12th European Conf on Computer Vision, p.101-115.doi:10.1007/978-3-642-33709-3_8http://doi.org/10.1007/978-3-642-33709-3_8

JrLaViola JJ,2015.Context aware 3D gesture recognition for games and virtual reality.Proc ACM SIGGRAPH 2015 Courses,Article 10.doi:10.1145/2776880.2792711http://doi.org/10.1145/2776880.2792711

LeCun Y,Bengio Y,Hinton G,2015.Deep learning.Nature,521(7553):436-444.doi:10.1038/nature14539http://doi.org/10.1038/nature14539

Li Y,Bengio S,Bailly G,2018.Predicting human performance in vertical menu selection using deep learning.Proc CHI Conf on Human Factors in Computing Systems, p.1-7.doi:10.1145/3173574.3173603http://doi.org/10.1145/3173574.3173603

Liu HT,Heynderickx I,2009.Studying the added value of visual attention in objective image quality metrics based on eye movement data.Proc 16th IEEE Int Conf on Image Processing, p.3097-3100.doi:10.1109/ICIP.2009.5414466http://doi.org/10.1109/ICIP.2009.5414466

Lo WC,Fan CL,Lee J,et al.,2017.360°video viewing dataset in head-mounted virtual reality.Proc 8th ACM on Multimedia System Conf, p.211-216.doi:10.1145/3083187.3083219http://doi.org/10.1145/3083187.3083219

Lopes P,You SJ,Cheng LP,et al.,2017.Providing haptics to walls & heavy objects in virtual reality by means of electrical muscle stimulation.Proc CHI Conf on Human Factors in Computing Systems, p.1471-1482.doi:10.1145/3025453.3025600http://doi.org/10.1145/3025453.3025600

Mathe S,Sminchisescu C,2012.Dynamic eye movement datasets and learnt saliency models for visual action recognition.Proc 12th European Conf on Computer Vision, p.842-856.doi:10.1007/978-3-642-33709-3_60http://doi.org/10.1007/978-3-642-33709-3_60

Nielsen M,Toft C,Nilsson NC,et al.,2016.Evaluating two alternative walking in place interfaces for virtual reality gaming.Proc IEEE Virtual Reality, p.299-300.doi:10.1109/VR.2016.7504772http://doi.org/10.1109/VR.2016.7504772

Pupil Labs,2020.Pupil Labs Developer Documentation.https://docs.pupil-labs.com/developer/core/overview/https://docs.pupil-labs.com/developer/core/overview/ [Accessed onSept. 27, 2020].

Rai Y,Gutiérrez J,Le Callet P,2017.A dataset of head and eye movements for 360 degree images.Proc 8th ACM on Multimedia Systems Conf, p.205-210.doi:10.1145/3083187.3083218http://doi.org/10.1145/3083187.3083218

Ramanathan S,Katti H,Sebe N,et al.,2010.An eye fixation database for saliency detection in images.Proc 11th European Conf on Computer Vision, p.30-43.doi:10.1007/978-3-642-15561-1_3http://doi.org/10.1007/978-3-642-15561-1_3

Riche N,Mancas M,Culibrk D,et al.,2013.Dynamic saliency models and human attention: a comparative study on videos.Proc 11th Asian Conf on Computer Vision, p.586-598.doi:10.1007/978-3-642-37431-9_45http://doi.org/10.1007/978-3-642-37431-9_45

Roth SD,1982.Ray casting for modeling solids.Comput Graph Image Process,18(2):109-144.doi:10.1016/0146-664X(82)90169-1http://doi.org/10.1016/0146-664X(82)90169-1

Shokoufandeh A,Marsic I,Dickinson SJ,1999.View-based object recognition using saliency maps.Image Vis Comput,17(5-6):445-460.doi:10.1016/S0262-8856(98)00124-3http://doi.org/10.1016/S0262-8856(98)00124-3

Sitzmann V,Serrano A,Pavel A,et al.,2018.Saliency in VR: how do people explore virtual environments?IEEE Trans Vis Comput Graph,24(4):1633-1642.doi:10.1109/TVCG.2018.2793599http://doi.org/10.1109/TVCG.2018.2793599

Suma EA,Azmandian M,Grechkin T,et al.,2015.Making small spaces feel large: infinite walking in virtual reality.Proc ACM SIGGRAPH 2015 Emerging Technologies, p.16.doi:10.1145/2782782.2792496http://doi.org/10.1145/2782782.2792496

Sun LY,Zhou YZ,Hansen P,et al.,2018.Cross-objects user interfaces for video interaction in virtual reality museum context.Multimed Tools Appl,77(21):29013-29041.doi:10.1007/s11042-018-6091-5http://doi.org/10.1007/s11042-018-6091-5

Unity Technologies,2019.Unity Documentation.https://docs.unity3d.com/ScriptReference/https://docs.unity3d.com/ScriptReference/ [Accessed onAug. 20, 2019].

Winkler S,Subramanian R,2013.Overview of eye tracking datasets.Proc 5th Int Workshop on Quality of Multimedia Experience, p.212-217.doi:10.1109/QoMEX.2013.6603239http://doi.org/10.1109/QoMEX.2013.6603239

Xu PM,Ehinger KA,Zhang YD,et al.,2015.TurkerGaze: crowdsourcing saliency with webcam based eye tracking.https://arxiv.org/abs/1504.06755https://arxiv.org/abs/1504.06755

Zhao Q,Koch C,2012.Learning visual saliency by combining feature maps in a nonlinear manner using AdaBoost.J Vis,12(6):22.doi:10.1167/12.6.22http://doi.org/10.1167/12.6.22

Zhou YZ,Feng T,Shuai SH,et al.,2019.An eye-tracking dataset for visual attention modelling in a virtual museum context.Proc 17th Int Conf on Virtual-Reality Continuum and its Applications in Industry,Article 39.doi:10.1145/3359997.3365738http://doi.org/10.1145/3359997.3365738

Zhu JY,Wu JJ,Xu Y,et al.,2015.Unsupervised object class discovery via saliency-guided multiple class learning.IEEE Trans Patt Anal Mach Intell,37(4):862-875.doi:10.1109/TPAMI.2014.2353617http://doi.org/10.1109/TPAMI.2014.2353617

浏览量

Downloads

CSCD

文章被引用时，请邮件提醒。

Submit

工具集

关联资源

Quant 4.0: engineering quantitative investment with automated, explainable, and knowledge-driven artificial intelligence

Improved deep learning aided key recovery framework: applications to large-state block ciphers

Accurate estimation of 6-DoF tooth pose in 3D intraoral scans for dental applications using deep learning

Deep unfolding based channel estimation for wideband terahertz near-field massive MIMO systems

Combining graph neural network with deep reinforcement learning for resource allocation in computing force networks