FOLLOWUS
1Institute of Computer Science and Technology, Peking University, Beijing 100871, China
2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
3Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
4National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
5Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
6Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
7School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
E-mail:pengyuxin@pku.edu.cn
E-mail:wwzhu@tsinghua.edu.cn
Published:2017-01,
Received:07 December 2016,
Accepted:2016-12-30
Scan QR Code
YU-XIN PENG, WEN-WU ZHU, YAO ZHAO, et al. Cross-media analysis and reasoning: advances and directions. [J]. Frontiers of information technology & electronic engineering, 2017, 18(1): 44-57.
YU-XIN PENG, WEN-WU ZHU, YAO ZHAO, et al. Cross-media analysis and reasoning: advances and directions. [J]. Frontiers of information technology & electronic engineering, 2017, 18(1): 44-57. DOI: 10.1631/FITEE.1601787.
跨媒体分析与推理是计算机科学的热点问题,也是人工智能中一个具有广阔前景的研究方向。目前,尚未有文献对跨媒体分析与推理的现有方法进行归纳总结并给出它的研究进展、挑战及发展方向。为解决这些问题,本文从七个方面进行综述:(1)跨媒体统一表征理论与模型;(2)跨媒体关联理解与深度挖掘;(3)跨媒体知识图谱构建与学习方法;(4)跨媒体知识演化与推理;(5)跨媒体描述与生成;(6)跨媒体智能引擎;(7)跨媒体智能应用。本文的目标是给出跨媒体分析与推理的方法、进展以及发展方向,吸引更多人关注该领域的最新进展,通过探讨面临的挑战和研究方向,为研究者提供重要参考。
Cross-media analysis and reasoning is an active research area in computer science
and a promising direction for artificial intelligence. However
to the best of our knowledge
no existing work has summarized the state-of-the-art methods for cross-media analysis and reasoning or presented advances
challenges
and future directions for the field. To address these issues
we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge graph construction and learning methodologies; (4) cross-media knowledge evolution and reasoning; (5) cross-media description and generation; (6) cross-media intelligent engines; and (7) cross-media intelligent applications. By presenting approaches
advances
and future directions in cross-media analysis and reasoning
our goal is not only to draw more attention to the state-of-the-art advances in the field
but also to provide technical insights by discussing the challenges and research directions in these areas.
跨媒体分析跨媒体推理跨媒体应用
Cross-media analysisCross-media reasoningCross-media applications
A Aamodt,,,E Plaza..Case-based reasoning: foundational issues, methodological variations, and system approaches..AI Commun.,,1994..7((1):):39--59..DOI:10.3233/AIC-1994-7104http://doi.org/10.3233/AIC-1994-7104..
F Adib,,,CY Hsu,,,H Mao,,,等..Capturing the human figure through a wall..ACM Trans. Graph.,,2015..34((6):):219DOI:10.1145/2816795.2818072http://doi.org/10.1145/2816795.2818072..
G Andrew,,,R Arora,,,J Bilmes,,,等..Deep canonical correlation analysis..2013..Int. Conf. on Machine Learning..1247--1255....
D Antenucci,,,E Li,,,S Liu,,,等..Ringtail: a generalized nowcasting system..Proc. VLDB Endow.,,2013..6((12):):1358--1361..DOI:10.14778/2536274.2536315http://doi.org/10.14778/2536274.2536315..
S Antol,,,A Agrawal,,,J Lu,,,等..VQA: visual question answering..2015..IEEE Int. Conf. on Computer Vision..2425--2433..DOI:10.1109/ICCV.2015.279http://doi.org/10.1109/ICCV.2015.279..
A Babenko,,,A Slesarev,,,A Chigorin,,,等..Neural codes for image retrieval..2014..European Conf. on Computer Vision..584--599..DOI:10.1007/978-3-319-10590-1_38http://doi.org/10.1007/978-3-319-10590-1_38..
RC Brownson,,,JG Gurney,,,GH Land..Evidence-based decision making in public health..J. Publ. Health Manag. Pract.,,1999..5((5):):86--97..DOI:10.1097/00124784-199909000-00012http://doi.org/10.1097/00124784-199909000-00012..
C Carlson,,,J Betteridge,,,B Kisiel,,,等..Towards an architecture for never-ending language learning..2010..AAAI Conf. on Artificial Intelligence..1306--1313....
DP Chen,,,SC Weber,,,PS Constantinou,,,等..Clinical arrays of laboratory measures, or "clinarrays", built from an electronic health record enable disease subtyping by severity..2007..AMIA Annual Symp. Proc...115--119....
X Chen,,,A Shrivastava,,,A Gupta..NEIL: extracting visual knowledge from web data..IEEE Int. Conf. on Computer Vision,,2013..1409--1416..DOI:10.1109/ICCV.2013.178http://doi.org/10.1109/ICCV.2013.178..
Y Chen,,,RJ Carroll,,,ERM Hinz,,,等..Applying active learning to high-throughput phenotyping algorithms for electronic health records data..J. Am. Med. Inform. Assoc.,,2013..20((e2):):253--259..DOI:10.1136/amiajnl-2013-001945http://doi.org/10.1136/amiajnl-2013-001945..
RL Cilibrasi,,,PMB Vitanyi..The Google similarity distance..IEEE Trans. Knowl. Data Eng.,,2007..19((3):):370--383..DOI:10.1109/TKDE.2007.48http://doi.org/10.1109/TKDE.2007.48..
A Culotta..Estimating county health statistics with twitter..2014..ACM Conf. on Human Factors in Computing Systems..1335--1344..DOI:10.1145/2556288.2557139http://doi.org/10.1145/2556288.2557139..
P Daras,,,S Manolopoulou,,,A Axenopoulos..Search and retrieval of rich media objects supporting multiple multimodal queries..IEEE Trans. Multim.,,2012..14((3):):734--746..DOI:10.1109/TMM.2011.2181343http://doi.org/10.1109/TMM.2011.2181343..
TH Davenport,,,L Prusak..Working Knowledge: How Organizations Manage What They Know,,1998..:Boston:Harvard Business School Press,,5..
J Deng,,,W Dong,,,R Socher,,,等..ImageNet: a large-scale hierarchical image database..2009..IEEE Conf. on Computer Vision and Pattern Recognition..248--255..DOI:10.1109/CVPR.2009.5206848http://doi.org/10.1109/CVPR.2009.5206848..
X Dong,,,E Gabrilovich,,,G Heitz,,,等..Knowledge vault: a Web-scale approach to probabilistic knowledge fusion..2014..ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining..601--610..DOI:10.1145/2623330.2623623http://doi.org/10.1145/2623330.2623623..
Q Fang,,,C Xu,,,J Sang,,,等..Folksonomy-based visual ontology construction and its applications..IEEE Trans. Multim.,,2016..18((4):):702--713..DOI:10.1109/TMM.2016.2527602http://doi.org/10.1109/TMM.2016.2527602..
C Fellbaum,,,G Miller..WordNet: an Electronic Lexical Database,,1998..:Cambridge, MA:MIT Press,,..
F Feng,,,X Wang,,,R Li..Cross-modal retrieval with correspondence autoencoder..2014..ACM Int. Conf. on Multimedia..7--16..DOI:10.1145/2647868.2654902http://doi.org/10.1145/2647868.2654902..
D Ferrucci,,,A Levas,,,S Bagchi,,,等..Watson: beyond jeopardy! Artif..Intell.,,2013..199-20093--105..DOI:10.1016/j.artint.2012.06.009http://doi.org/10.1016/j.artint.2012.06.009..
J Fuentes-Pacheco,,,J Ruiz-Ascencio,,,JM Rendn-Mancha..Visual simultaneous localization and mapping: a survey..Artif. Intell. Rev.,,2015..43((1):):55--81..DOI:10.1007/s10462-012-9365-8http://doi.org/10.1007/s10462-012-9365-8..
E Garfield..Historiographic mapping of knowledge domains literature..J. Inform. Sci.,,2004..30((2):):119--145..DOI:10.1177/0165551504042802http://doi.org/10.1177/0165551504042802..
E Gibney..DeepMind algorithm beats people at classic video games..Nature,,2015..518((7540):):465--466....
J Ginsberg,,,M Mohebbi,,,RS Patel,,,等..Detecting influenza epidemics using search engine query data..Nature,,2009..457((7232):):1012--1014....
Y Gong,,,Q Ke,,,M Isard,,,等..A multi-view embedding space for modeling internet images, tags, and their semantics..Int. J. Comput. Vis.,,2014..106((2):):210--233..DOI:10.1007/s11263-013-0658-4http://doi.org/10.1007/s11263-013-0658-4..
S Hochreiter,,,J Schmidhuber..Long short-term memory..Neur. Comput.,,1997..9((8):):1735--1780..DOI:10.1162/neco.1997.9.8.1735http://doi.org/10.1162/neco.1997.9.8.1735..
M Hodosh,,,P Young,,,J Hockenmaier..Framing image description as a ranking task: data, models and evaluation metrics..J. Artif. Intell. Res.,,2013..47((1):):853--899....
H Hotelling..Relations between two sets of variates..Biometrika,,1936..28((3-4):):321--377..DOI:10.1093/biomet/28.3-4.321http://doi.org/10.1093/biomet/28.3-4.321..
F Hsu..Behind Deep Blue: Building the Computer that Defeated the World Chess Champion,,2002..:Princeton, USA:Princeton University Press,,..
Y Hua,,,S Wang,,,S Liu,,,等..TINA: cross-modal correlation learning by adaptive hierarchical semantic aggregation..2014..IEEE Int. Conf. on Data Mining..190--199..DOI:10.1109/ICDM.2014.65http://doi.org/10.1109/ICDM.2014.65..
X Jia,,,E Gavves,,,B Fernando,,,等..Guiding long-short term memory for image caption generation..2015..arXiv:1509.04942....
J Johnson,,,R Krishna,,,M Stark,,,等..Image retrieval using scene graphs..2015..IEEE Conf. on Computer Vision and Pattern Recognition..3668--3678..DOI:10.1109/CVPR.2015.7298990http://doi.org/10.1109/CVPR.2015.7298990..
A Karpathy,,,FF Li..Deep visual-semantic alignments for generating image descriptions..2015..IEEE Conf. on Computer Vision and Pattern Recognition..3128--3137..DOI:10.1109/CVPR.2015.7298932http://doi.org/10.1109/CVPR.2015.7298932..
A Krizhevsky,,,I Sutskever,,,GE Hinton..ImageNet: classification with deep convolutional neural networks..2012..Advances in Neural Information Processing Systems..1097--1105....
G Kulkarni,,,V Premraj,,,S Dhar,,,等..Baby talk: understanding and generating simple image descriptions..2011..IEEE Conf. on Computer Vision and Pattern Recognition..1601--1608..DOI:10.1109/CVPR.2011.5995466http://doi.org/10.1109/CVPR.2011.5995466..
S Kumar,,,M Sanderford,,,VE Gray,,,等..Evolutionary diagnosis method for variants in personal exomes..Nat. Meth.,,2012..9((9):):855--856..DOI:10.1038/nmeth.2147http://doi.org/10.1038/nmeth.2147..
P Kuznetsova,,,V Ordonezz,,,TL Berg,,,等..TREETALK: composition and compression of trees for image descriptions..Trans. Assoc. Comput. Ling.,,2014..2351--362....
A Lazaric..Transfer in reinforcement learning: a frame-work and a survey. In: Wiering M, van Otterlo, M. (Eds.), Reinforcement Learning: State-of-the-Art..2012..Springer Berlin Heidelberg, Berlin..143--173..DOI:10.1007/978-3-642-27645-3_5http://doi.org/10.1007/978-3-642-27645-3_5..
D Lazer,,,R Kennedy,,,G King,,,等..The parable of Google flu: traps in big data analysis..Science,,2014..343((6176):):1203--1205..DOI:10.1126/science.1248506http://doi.org/10.1126/science.1248506..
MS Lew,,,N Sebe,,,C Djeraba,,,等..Content-based multimedia information retrieval: state of the art and challenges..ACM Trans. Multim. Comput. Commun. Appl.,,2006..2((1):):1--19..DOI:10.1145/1126004.1126005http://doi.org/10.1145/1126004.1126005..
T Lin,,,P Pantel,,,M Gamon,,,等..Active objects: actions for entity-centric search..2012..ACM Int. Conf. on World Wide Web..589--598..DOI:10.1145/2187836.2187916http://doi.org/10.1145/2187836.2187916..
G Luo,,,C Tang..On iterative intelligent medical search..2008..ACM SIGIR Conf. on Research and Development in Information Retrieval..3--10..DOI:10.1145/1390334.1390338http://doi.org/10.1145/1390334.1390338..
X Mao,,,B Lin,,,D Cai,,,等..Parallel field alignment for cross media retrieval..2013..ACM Int. Conf. on Multimedia..897--906..DOI:10.1145/2502081.2502087http://doi.org/10.1145/2502081.2502087..
H McGurk,,,J MacDonald..Hearing lips and seeing voices..Nature,,1976..264((5588):):746--748..DOI:10.1038/264746a0http://doi.org/10.1038/264746a0..
MIT Technology ReviewData driven healthcare,,2014..Dec.06.2016https://www.technologyreview.com/business-report/data-driven-health-care/freehttps://www.technologyreview.com/business-report/data-driven-health-care/free..
V Mnih,,,K Kavukcuoglu,,,D Silver..Human-level control through deep reinforcement learning..Nature,,2015..518((7540):):529--333..DOI:10.1038/nature14236http://doi.org/10.1038/nature14236..
J Ngiam,,,A Khosla,,,M Kim,,,等..Multimodal deep learning..2011..Int. Conf. on Machine Learning..689--696....
V Ordonez,,,G Kulkarni,,,TL Berg..Im2text: describing images using 1 million captioned photographs..2011..Advances in Neural Information Processing Systems..1143--1151....
YH Pan..Heading toward artificial intelligence 2.0..Engineering,,2016..2((4):):409--413..DOI:10.1016/J.ENG.2016.04.018http://doi.org/10.1016/J.ENG.2016.04.018..
J Pearl..Causality: Models, Reasoning and Inference,,2000..:Cambridge, UK:Cambridge University Press,,..
Y Peng,,,X Huang,,,J Qi..Cross-media shared representation by hierarchical learning with multiple deep networks..2016a..Int. Joint Conf. on Artificial Intelligence..3846--3853....
Y Peng,,,X Zhai,,,Y Zhao,,,等..Semi-supervised cross-media feature learning with unified patch graph regularization..IEEE Trans. Circ. Syst. Video Technol.,,2016b..26((3):):583--596..DOI:10.1109/TCSVT.2015.2400779http://doi.org/10.1109/TCSVT.2015.2400779..
N Prabhu,,,RV Babu..Attribute-Graph: a graph based approach to image ranking..2015..IEEE Int. Conf. on Computer Vision..1071--1079..DOI:10.1109/ICCV.2015.128http://doi.org/10.1109/ICCV.2015.128..
K Radinsky,,,S Davidovich,,,S Markovitch..Learning causality for news events prediction..2012..Int. Conf. on World Wide Web..909--918..DOI:10.1145/2187836.2187958http://doi.org/10.1145/2187836.2187958..
N Rasiwasia,,,Pereira J Costa,,,E Coviello,,,等..A new approach to cross-modal multimedia retrieval..2010..ACM Int. Conf. on Multimedia..251--260..DOI:10.1145/1873951.1873987http://doi.org/10.1145/1873951.1873987..
N Rasiwasia,,,D Mahajan,,,V Mahadevan,,,等..Cluster canonical correlation analysis..2014..Int. Conf. on Artificial Intelligence and Statistics..823--831....
SS Rautaray,,,A Agrawal..Vision based hand gesture recognition for human computer interaction: a survey..Artif. Intell. Rev.,,2015..43((1):):1--54..DOI:10.1007/s10462-012-9356-9http://doi.org/10.1007/s10462-012-9356-9..
S Roller,,,im Walde S Schulte..A multimodal LDA model integrating textual, cognitive and visual modalities..2013..Conf. on Empirical Methods in Natural Language Processing..1146--1157....
F Sadeghi,,,SK Divvala,,,A Farhadi..VisKE: visual knowledge extraction and question answering by visual verification of relation phrases..2015..IEEE Conf. on Computer Vision and Pattern Recognition..1456--1464..DOI:10.1109/CVPR.2015.7298752http://doi.org/10.1109/CVPR.2015.7298752..
A Singhal..Introducing the knowledge graph: things, not strings..2012..Official Blog of Google....
R Socher,,,C Lin,,,AY Ng,,,等..Parsing natural scenes and natural language with recursive neural networks..2011..Int. Conf. on Machine Learning..129--136....
R Socher,,,A Karpathy,,,Q Le,,,等..Grounded compositional semantics for finding and describing images with sentences..Trans. Assoc. Comput. Ling.,,2014..2207--218....
N Srivastava,,,R Salakhutdinov..Multimodal learning with deep Boltzmann machines..2012..Advances in Neural Information Processing Systems..2222--2230....
F Suchanek,,,G Weikum..Knowledge bases in the age of big data analytics..Proc. VLDB Endow.,,2014..7((13):):1713--1714..DOI:10.14778/2733004.2733069http://doi.org/10.14778/2733004.2733069..
A Uyar,,,FM Aliyu..Evaluating search features of Google Knowledge Graph and Bing Satori: entity types, list searches and query interfaces..Onl. Inform. Rev.,,2015..39((2):):197--213..DOI:10.1108/OIR-10-2014-0257http://doi.org/10.1108/OIR-10-2014-0257..
O Vinyals,,,A Toshev,,,S Bengio,,,等..Show and tell: a neural image caption generator..2015..IEEE Conf. on Computer Vision and Pattern Recognition..3156--3164..DOI:10.1109/CVPR.2015.7298935http://doi.org/10.1109/CVPR.2015.7298935..
D Wang,,,P Cui,,,M Ou,,,等..Learning compact hash codes for multimodal representations using orthogonal deep structure..IEEE Trans. Multim.,,2015..17((9):):1404--1416..DOI:10.1109/TMM.2015.2455415http://doi.org/10.1109/TMM.2015.2455415..
W Wang,,,BC Ooi,,,X Yang,,,等..Effective multi-modal retrieval based on stacked auto-encoders..Proc. VLDB Endow.,,2014..7((8):):649--660..DOI:10.14778/2732296.2732301http://doi.org/10.14778/2732296.2732301..
Y Wang,,,F Wu,,,J Song,,,等..Multi-modal mutual topic reinforce modeling for cross-media retrieval..2014..ACM Int. Conf. on Multimedia..307--316..DOI:10.1145/2647868.2654901http://doi.org/10.1145/2647868.2654901..
Y Wei,,,Y Zhao,,,C Lu,,,等..Cross-modal retrieval with CNN visual features: a new baseline..IEEE Trans. Cybern.,,2017..47((2):):449--460..DOI:10.1109/TCYB.2016.2519449http://doi.org/10.1109/TCYB.2016.2519449..
W Wu,,,J Xu,,,H Li..Learning similarity function between objects in heterogeneous spaces..2010..Technique Report MSR-TR-2010-86, Microsoft....
K Xu,,,J Ba,,,R Kiros,,,等..Show, attend and tell: neural image caption generation with visual attention..2015..Int. Conf. on Machine Learning..2048--2057....
Y Yang,,,Y Zhuang,,,F Wu,,,等..Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval..IEEE Trans. Multim.,,2008..10((3):):437--446..DOI:10.1109/TMM.2008.917359http://doi.org/10.1109/TMM.2008.917359..
Y Yang,,,CL Teo,,,H Daume,,,等..Corpus-guided sentence generation of natural images..2011..Conf. on Empirical Methods in Natural Language Processing..444--454....
Y Yang,,,F Nie,,,D Xu,,,等..A multimedia retrieval framework based on semi-supervised ranking and relevance feedback..IEEE Trans. Patt. Anal. Mach. Intell.,,2012..34((4):):723--742..DOI:10.1109/TPAMI.2011.170http://doi.org/10.1109/TPAMI.2011.170..
L Yuan,,,C Pan,,,S Ji,,,等..Automated annotation of developmental stages of Drosophila embryos in images containing spatial patterns of expression..Bioinformatics,,2014..30((2):):266--273..DOI:10.1093/bioinformatics/btt648http://doi.org/10.1093/bioinformatics/btt648..
X Zhai,,,Y Peng,,,J Xiao..Learning cross-media joint representation with sparse and semi-supervised regularization..IEEE Trans. Circ. Syst. Video Technol.,,2014..24((6):):965--978....
H Zhang,,,Y Yang,,,H Luan,,,等..Start from scratch: towards automatically identifying, modeling, and naming visual attributes..2014a..ACM Int. Conf. on Multimedia..187--196..DOI:10.1109/TCSVT.2013.2276704http://doi.org/10.1109/TCSVT.2013.2276704..
H Zhang,,,J Yuan,,,X Gao,,,等..Boosting cross-media retrieval via visual-auditory feature analysis and relevance feedback..2014b..ACM Int. Conf. on Multimedia..953--956..DOI:10.1145/2647868.2654915http://doi.org/10.1145/2647868.2654915..
H Zhang,,,X Shang,,,H Luan,,,等..Learning from collective intelligence: feature learning using social images and tags..ACM Trans. Multim. Comput. Commun. Appl.,,2016..13((1):):1DOI:10.1145/2647868.2654975http://doi.org/10.1145/2647868.2654975..
J Zhang,,,S Wang,,,Q Huang..Location-based parallel tag completion for geo-tagged social image retrieval..2015..ACM Int. Conf. on Multimedia Retrieval..355--362..DOI:10.1145/2978656http://doi.org/10.1145/2978656..
Y Zhu,,,C Zhang,,,C R,,,等..Building a large-scale multimodal knowledge base system for answering visual queries..2015..arXiv:1507.05670....
Publicity Resources
Related Articles
Related Author
Related Institution