

FOLLOWUS
1Institute of Computer Science and Technology, Peking University, Beijing 100871, China
2Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
3Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China
4National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
5Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
6Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China
7School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
E-mail:pengyuxin@pku.edu.cn
E-mail:wwzhu@tsinghua.edu.cn
Received:07 December 2016,
Accepted:30 December 2016,
Published:2017-01
Scan QR Code
Yu-xin PENG, Wen-wu ZHU, Yao ZHAO, et al. Cross-media analysis and reasoning: advances and directions[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 44-57.
Yu-xin PENG, Wen-wu ZHU, Yao ZHAO, et al. Cross-media analysis and reasoning: advances and directions[J]. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 44-57. DOI: 10.1631/FITEE.1601787.
跨媒体分析与推理是计算机科学的热点问题,也是人工智能中一个具有广阔前景的研究方向。目前,尚未有文献对跨媒体分析与推理的现有方法进行归纳总结并给出它的研究进展、挑战及发展方向。为解决这些问题,本文从七个方面进行综述:(1)跨媒体统一表征理论与模型;(2)跨媒体关联理解与深度挖掘;(3)跨媒体知识图谱构建与学习方法;(4)跨媒体知识演化与推理;(5)跨媒体描述与生成;(6)跨媒体智能引擎;(7)跨媒体智能应用。本文的目标是给出跨媒体分析与推理的方法、进展以及发展方向,吸引更多人关注该领域的最新进展,通过探讨面临的挑战和研究方向,为研究者提供重要参考。
Cross-media analysis and reasoning is an active research area in computer science
and a promising direction for artificial intelligence. However
to the best of our knowledge
no existing work has summarized the state-of-the-art methods for cross-media analysis and reasoning or presented advances
challenges
and future directions for the field. To address these issues
we provide an overview as follows: (1) theory and model for cross-media uniform representation; (2) cross-media correlation understanding and deep mining; (3) cross-media knowledge graph construction and learning methodologies; (4) cross-media knowledge evolution and reasoning; (5) cross-media description and generation; (6) cross-media intelligent engines; and (7) cross-media intelligent applications. By presenting approaches
advances
and future directions in cross-media analysis and reasoning
our goal is not only to draw more attention to the state-of-the-art advances in the field
but also to provide technical insights by discussing the challenges and research directions in these areas.
A Aamodt , , , E Plaza . . Case-based reasoning: foundational issues, methodological variations, and system approaches . . AI Commun. , , 1994 . . 7 ( ( 1 ): ): 39 - - 59 . . DOI: 10.3233/AIC-1994-7104 http://doi.org/10.3233/AIC-1994-7104 . .
F Adib , , , CY Hsu , , , H Mao , , , 等 . . Capturing the human figure through a wall . . ACM Trans. Graph. , , 2015 . . 34 ( ( 6 ): ): 219 DOI: 10.1145/2816795.2818072 http://doi.org/10.1145/2816795.2818072 . .
G Andrew , , , R Arora , , , J Bilmes , , , 等 . . Deep canonical correlation analysis . . 2013 . . Int. Conf. on Machine Learning . . 1247 - - 1255 . . . .
D Antenucci , , , E Li , , , S Liu , , , 等 . . Ringtail: a generalized nowcasting system . . Proc. VLDB Endow. , , 2013 . . 6 ( ( 12 ): ): 1358 - - 1361 . . DOI: 10.14778/2536274.2536315 http://doi.org/10.14778/2536274.2536315 . .
S Antol , , , A Agrawal , , , J Lu , , , 等 . . VQA: visual question answering . . 2015 . . IEEE Int. Conf. on Computer Vision . . 2425 - - 2433 . . DOI: 10.1109/ICCV.2015.279 http://doi.org/10.1109/ICCV.2015.279 . .
A Babenko , , , A Slesarev , , , A Chigorin , , , 等 . . Neural codes for image retrieval . . 2014 . . European Conf. on Computer Vision . . 584 - - 599 . . DOI: 10.1007/978-3-319-10590-1_38 http://doi.org/10.1007/978-3-319-10590-1_38 . .
RC Brownson , , , JG Gurney , , , GH Land . . Evidence-based decision making in public health . . J. Publ. Health Manag. Pract. , , 1999 . . 5 ( ( 5 ): ): 86 - - 97 . . DOI: 10.1097/00124784-199909000-00012 http://doi.org/10.1097/00124784-199909000-00012 . .
C Carlson , , , J Betteridge , , , B Kisiel , , , 等 . . Towards an architecture for never-ending language learning . . 2010 . . AAAI Conf. on Artificial Intelligence . . 1306 - - 1313 . . . .
DP Chen , , , SC Weber , , , PS Constantinou , , , 等 . . Clinical arrays of laboratory measures, or "clinarrays", built from an electronic health record enable disease subtyping by severity . . 2007 . . AMIA Annual Symp. Proc. . . 115 - - 119 . . . .
X Chen , , , A Shrivastava , , , A Gupta . . NEIL: extracting visual knowledge from web data . . IEEE Int. Conf. on Computer Vision , , 2013 . . 1409 - - 1416 . . DOI: 10.1109/ICCV.2013.178 http://doi.org/10.1109/ICCV.2013.178 . .
Y Chen , , , RJ Carroll , , , ERM Hinz , , , 等 . . Applying active learning to high-throughput phenotyping algorithms for electronic health records data . . J. Am. Med. Inform. Assoc. , , 2013 . . 20 ( ( e2 ): ): 253 - - 259 . . DOI: 10.1136/amiajnl-2013-001945 http://doi.org/10.1136/amiajnl-2013-001945 . .
RL Cilibrasi , , , PMB Vitanyi . . The Google similarity distance . . IEEE Trans. Knowl. Data Eng. , , 2007 . . 19 ( ( 3 ): ): 370 - - 383 . . DOI: 10.1109/TKDE.2007.48 http://doi.org/10.1109/TKDE.2007.48 . .
A Culotta . . Estimating county health statistics with twitter . . 2014 . . ACM Conf. on Human Factors in Computing Systems . . 1335 - - 1344 . . DOI: 10.1145/2556288.2557139 http://doi.org/10.1145/2556288.2557139 . .
P Daras , , , S Manolopoulou , , , A Axenopoulos . . Search and retrieval of rich media objects supporting multiple multimodal queries . . IEEE Trans. Multim. , , 2012 . . 14 ( ( 3 ): ): 734 - - 746 . . DOI: 10.1109/TMM.2011.2181343 http://doi.org/10.1109/TMM.2011.2181343 . .
TH Davenport , , , L Prusak . . Working Knowledge: How Organizations Manage What They Know , , 1998 . . : Boston : Harvard Business School Press , , 5 . .
J Deng , , , W Dong , , , R Socher , , , 等 . . ImageNet: a large-scale hierarchical image database . . 2009 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 248 - - 255 . . DOI: 10.1109/CVPR.2009.5206848 http://doi.org/10.1109/CVPR.2009.5206848 . .
X Dong , , , E Gabrilovich , , , G Heitz , , , 等 . . Knowledge vault: a Web-scale approach to probabilistic knowledge fusion . . 2014 . . ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining . . 601 - - 610 . . DOI: 10.1145/2623330.2623623 http://doi.org/10.1145/2623330.2623623 . .
Q Fang , , , C Xu , , , J Sang , , , 等 . . Folksonomy-based visual ontology construction and its applications . . IEEE Trans. Multim. , , 2016 . . 18 ( ( 4 ): ): 702 - - 713 . . DOI: 10.1109/TMM.2016.2527602 http://doi.org/10.1109/TMM.2016.2527602 . .
C Fellbaum , , , G Miller . . WordNet: an Electronic Lexical Database , , 1998 . . : Cambridge, MA : MIT Press , , . .
F Feng , , , X Wang , , , R Li . . Cross-modal retrieval with correspondence autoencoder . . 2014 . . ACM Int. Conf. on Multimedia . . 7 - - 16 . . DOI: 10.1145/2647868.2654902 http://doi.org/10.1145/2647868.2654902 . .
D Ferrucci , , , A Levas , , , S Bagchi , , , 等 . . Watson: beyond jeopardy! Artif . . Intell. , , 2013 . . 199-200 93 - - 105 . . DOI: 10.1016/j.artint.2012.06.009 http://doi.org/10.1016/j.artint.2012.06.009 . .
J Fuentes-Pacheco , , , J Ruiz-Ascencio , , , JM Rendn-Mancha . . Visual simultaneous localization and mapping: a survey . . Artif. Intell. Rev. , , 2015 . . 43 ( ( 1 ): ): 55 - - 81 . . DOI: 10.1007/s10462-012-9365-8 http://doi.org/10.1007/s10462-012-9365-8 . .
E Garfield . . Historiographic mapping of knowledge domains literature . . J. Inform. Sci. , , 2004 . . 30 ( ( 2 ): ): 119 - - 145 . . DOI: 10.1177/0165551504042802 http://doi.org/10.1177/0165551504042802 . .
E Gibney . . DeepMind algorithm beats people at classic video games . . Nature , , 2015 . . 518 ( ( 7540 ): ): 465 - - 466 . . . .
J Ginsberg , , , M Mohebbi , , , RS Patel , , , 等 . . Detecting influenza epidemics using search engine query data . . Nature , , 2009 . . 457 ( ( 7232 ): ): 1012 - - 1014 . . . .
Y Gong , , , Q Ke , , , M Isard , , , 等 . . A multi-view embedding space for modeling internet images, tags, and their semantics . . Int. J. Comput. Vis. , , 2014 . . 106 ( ( 2 ): ): 210 - - 233 . . DOI: 10.1007/s11263-013-0658-4 http://doi.org/10.1007/s11263-013-0658-4 . .
S Hochreiter , , , J Schmidhuber . . Long short-term memory . . Neur. Comput. , , 1997 . . 9 ( ( 8 ): ): 1735 - - 1780 . . DOI: 10.1162/neco.1997.9.8.1735 http://doi.org/10.1162/neco.1997.9.8.1735 . .
M Hodosh , , , P Young , , , J Hockenmaier . . Framing image description as a ranking task: data, models and evaluation metrics . . J. Artif. Intell. Res. , , 2013 . . 47 ( ( 1 ): ): 853 - - 899 . . . .
H Hotelling . . Relations between two sets of variates . . Biometrika , , 1936 . . 28 ( ( 3-4 ): ): 321 - - 377 . . DOI: 10.1093/biomet/28.3-4.321 http://doi.org/10.1093/biomet/28.3-4.321 . .
F Hsu . . Behind Deep Blue: Building the Computer that Defeated the World Chess Champion , , 2002 . . : Princeton, USA : Princeton University Press , , . .
Y Hua , , , S Wang , , , S Liu , , , 等 . . TINA: cross-modal correlation learning by adaptive hierarchical semantic aggregation . . 2014 . . IEEE Int. Conf. on Data Mining . . 190 - - 199 . . DOI: 10.1109/ICDM.2014.65 http://doi.org/10.1109/ICDM.2014.65 . .
X Jia , , , E Gavves , , , B Fernando , , , 等 . . Guiding long-short term memory for image caption generation . . 2015 . . arXiv:1509.04942 . . . .
J Johnson , , , R Krishna , , , M Stark , , , 等 . . Image retrieval using scene graphs . . 2015 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 3668 - - 3678 . . DOI: 10.1109/CVPR.2015.7298990 http://doi.org/10.1109/CVPR.2015.7298990 . .
A Karpathy , , , FF Li . . Deep visual-semantic alignments for generating image descriptions . . 2015 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 3128 - - 3137 . . DOI: 10.1109/CVPR.2015.7298932 http://doi.org/10.1109/CVPR.2015.7298932 . .
A Krizhevsky , , , I Sutskever , , , GE Hinton . . ImageNet: classification with deep convolutional neural networks . . 2012 . . Advances in Neural Information Processing Systems . . 1097 - - 1105 . . . .
G Kulkarni , , , V Premraj , , , S Dhar , , , 等 . . Baby talk: understanding and generating simple image descriptions . . 2011 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 1601 - - 1608 . . DOI: 10.1109/CVPR.2011.5995466 http://doi.org/10.1109/CVPR.2011.5995466 . .
S Kumar , , , M Sanderford , , , VE Gray , , , 等 . . Evolutionary diagnosis method for variants in personal exomes . . Nat. Meth. , , 2012 . . 9 ( ( 9 ): ): 855 - - 856 . . DOI: 10.1038/nmeth.2147 http://doi.org/10.1038/nmeth.2147 . .
P Kuznetsova , , , V Ordonezz , , , TL Berg , , , 等 . . TREETALK: composition and compression of trees for image descriptions . . Trans. Assoc. Comput. Ling. , , 2014 . . 2 351 - - 362 . . . .
A Lazaric . . Transfer in reinforcement learning: a frame-work and a survey. In: Wiering M, van Otterlo, M. (Eds.), Reinforcement Learning: State-of-the-Art . . 2012 . . Springer Berlin Heidelberg, Berlin . . 143 - - 173 . . DOI: 10.1007/978-3-642-27645-3_5 http://doi.org/10.1007/978-3-642-27645-3_5 . .
D Lazer , , , R Kennedy , , , G King , , , 等 . . The parable of Google flu: traps in big data analysis . . Science , , 2014 . . 343 ( ( 6176 ): ): 1203 - - 1205 . . DOI: 10.1126/science.1248506 http://doi.org/10.1126/science.1248506 . .
MS Lew , , , N Sebe , , , C Djeraba , , , 等 . . Content-based multimedia information retrieval: state of the art and challenges . . ACM Trans. Multim. Comput. Commun. Appl. , , 2006 . . 2 ( ( 1 ): ): 1 - - 19 . . DOI: 10.1145/1126004.1126005 http://doi.org/10.1145/1126004.1126005 . .
T Lin , , , P Pantel , , , M Gamon , , , 等 . . Active objects: actions for entity-centric search . . 2012 . . ACM Int. Conf. on World Wide Web . . 589 - - 598 . . DOI: 10.1145/2187836.2187916 http://doi.org/10.1145/2187836.2187916 . .
G Luo , , , C Tang . . On iterative intelligent medical search . . 2008 . . ACM SIGIR Conf. on Research and Development in Information Retrieval . . 3 - - 10 . . DOI: 10.1145/1390334.1390338 http://doi.org/10.1145/1390334.1390338 . .
X Mao , , , B Lin , , , D Cai , , , 等 . . Parallel field alignment for cross media retrieval . . 2013 . . ACM Int. Conf. on Multimedia . . 897 - - 906 . . DOI: 10.1145/2502081.2502087 http://doi.org/10.1145/2502081.2502087 . .
H McGurk , , , J MacDonald . . Hearing lips and seeing voices . . Nature , , 1976 . . 264 ( ( 5588 ): ): 746 - - 748 . . DOI: 10.1038/264746a0 http://doi.org/10.1038/264746a0 . .
MIT Technology Review Data driven healthcare , , 2014 . . Dec.06.2016 https://www.technologyreview.com/business-report/data-driven-health-care/free https://www.technologyreview.com/business-report/data-driven-health-care/free . .
V Mnih , , , K Kavukcuoglu , , , D Silver . . Human-level control through deep reinforcement learning . . Nature , , 2015 . . 518 ( ( 7540 ): ): 529 - - 333 . . DOI: 10.1038/nature14236 http://doi.org/10.1038/nature14236 . .
J Ngiam , , , A Khosla , , , M Kim , , , 等 . . Multimodal deep learning . . 2011 . . Int. Conf. on Machine Learning . . 689 - - 696 . . . .
V Ordonez , , , G Kulkarni , , , TL Berg . . Im2text: describing images using 1 million captioned photographs . . 2011 . . Advances in Neural Information Processing Systems . . 1143 - - 1151 . . . .
YH Pan . . Heading toward artificial intelligence 2.0 . . Engineering , , 2016 . . 2 ( ( 4 ): ): 409 - - 413 . . DOI: 10.1016/J.ENG.2016.04.018 http://doi.org/10.1016/J.ENG.2016.04.018 . .
J Pearl . . Causality: Models, Reasoning and Inference , , 2000 . . : Cambridge, UK : Cambridge University Press , , . .
Y Peng , , , X Huang , , , J Qi . . Cross-media shared representation by hierarchical learning with multiple deep networks . . 2016a . . Int. Joint Conf. on Artificial Intelligence . . 3846 - - 3853 . . . .
Y Peng , , , X Zhai , , , Y Zhao , , , 等 . . Semi-supervised cross-media feature learning with unified patch graph regularization . . IEEE Trans. Circ. Syst. Video Technol. , , 2016b . . 26 ( ( 3 ): ): 583 - - 596 . . DOI: 10.1109/TCSVT.2015.2400779 http://doi.org/10.1109/TCSVT.2015.2400779 . .
N Prabhu , , , RV Babu . . Attribute-Graph: a graph based approach to image ranking . . 2015 . . IEEE Int. Conf. on Computer Vision . . 1071 - - 1079 . . DOI: 10.1109/ICCV.2015.128 http://doi.org/10.1109/ICCV.2015.128 . .
K Radinsky , , , S Davidovich , , , S Markovitch . . Learning causality for news events prediction . . 2012 . . Int. Conf. on World Wide Web . . 909 - - 918 . . DOI: 10.1145/2187836.2187958 http://doi.org/10.1145/2187836.2187958 . .
N Rasiwasia , , , Pereira J Costa , , , E Coviello , , , 等 . . A new approach to cross-modal multimedia retrieval . . 2010 . . ACM Int. Conf. on Multimedia . . 251 - - 260 . . DOI: 10.1145/1873951.1873987 http://doi.org/10.1145/1873951.1873987 . .
N Rasiwasia , , , D Mahajan , , , V Mahadevan , , , 等 . . Cluster canonical correlation analysis . . 2014 . . Int. Conf. on Artificial Intelligence and Statistics . . 823 - - 831 . . . .
SS Rautaray , , , A Agrawal . . Vision based hand gesture recognition for human computer interaction: a survey . . Artif. Intell. Rev. , , 2015 . . 43 ( ( 1 ): ): 1 - - 54 . . DOI: 10.1007/s10462-012-9356-9 http://doi.org/10.1007/s10462-012-9356-9 . .
S Roller , , , im Walde S Schulte . . A multimodal LDA model integrating textual, cognitive and visual modalities . . 2013 . . Conf. on Empirical Methods in Natural Language Processing . . 1146 - - 1157 . . . .
F Sadeghi , , , SK Divvala , , , A Farhadi . . VisKE: visual knowledge extraction and question answering by visual verification of relation phrases . . 2015 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 1456 - - 1464 . . DOI: 10.1109/CVPR.2015.7298752 http://doi.org/10.1109/CVPR.2015.7298752 . .
A Singhal . . Introducing the knowledge graph: things, not strings . . 2012 . . Official Blog of Google . . . .
R Socher , , , C Lin , , , AY Ng , , , 等 . . Parsing natural scenes and natural language with recursive neural networks . . 2011 . . Int. Conf. on Machine Learning . . 129 - - 136 . . . .
R Socher , , , A Karpathy , , , Q Le , , , 等 . . Grounded compositional semantics for finding and describing images with sentences . . Trans. Assoc. Comput. Ling. , , 2014 . . 2 207 - - 218 . . . .
N Srivastava , , , R Salakhutdinov . . Multimodal learning with deep Boltzmann machines . . 2012 . . Advances in Neural Information Processing Systems . . 2222 - - 2230 . . . .
F Suchanek , , , G Weikum . . Knowledge bases in the age of big data analytics . . Proc. VLDB Endow. , , 2014 . . 7 ( ( 13 ): ): 1713 - - 1714 . . DOI: 10.14778/2733004.2733069 http://doi.org/10.14778/2733004.2733069 . .
A Uyar , , , FM Aliyu . . Evaluating search features of Google Knowledge Graph and Bing Satori: entity types, list searches and query interfaces . . Onl. Inform. Rev. , , 2015 . . 39 ( ( 2 ): ): 197 - - 213 . . DOI: 10.1108/OIR-10-2014-0257 http://doi.org/10.1108/OIR-10-2014-0257 . .
O Vinyals , , , A Toshev , , , S Bengio , , , 等 . . Show and tell: a neural image caption generator . . 2015 . . IEEE Conf. on Computer Vision and Pattern Recognition . . 3156 - - 3164 . . DOI: 10.1109/CVPR.2015.7298935 http://doi.org/10.1109/CVPR.2015.7298935 . .
D Wang , , , P Cui , , , M Ou , , , 等 . . Learning compact hash codes for multimodal representations using orthogonal deep structure . . IEEE Trans. Multim. , , 2015 . . 17 ( ( 9 ): ): 1404 - - 1416 . . DOI: 10.1109/TMM.2015.2455415 http://doi.org/10.1109/TMM.2015.2455415 . .
W Wang , , , BC Ooi , , , X Yang , , , 等 . . Effective multi-modal retrieval based on stacked auto-encoders . . Proc. VLDB Endow. , , 2014 . . 7 ( ( 8 ): ): 649 - - 660 . . DOI: 10.14778/2732296.2732301 http://doi.org/10.14778/2732296.2732301 . .
Y Wang , , , F Wu , , , J Song , , , 等 . . Multi-modal mutual topic reinforce modeling for cross-media retrieval . . 2014 . . ACM Int. Conf. on Multimedia . . 307 - - 316 . . DOI: 10.1145/2647868.2654901 http://doi.org/10.1145/2647868.2654901 . .
Y Wei , , , Y Zhao , , , C Lu , , , 等 . . Cross-modal retrieval with CNN visual features: a new baseline . . IEEE Trans. Cybern. , , 2017 . . 47 ( ( 2 ): ): 449 - - 460 . . DOI: 10.1109/TCYB.2016.2519449 http://doi.org/10.1109/TCYB.2016.2519449 . .
W Wu , , , J Xu , , , H Li . . Learning similarity function between objects in heterogeneous spaces . . 2010 . . Technique Report MSR-TR-2010-86, Microsoft . . . .
K Xu , , , J Ba , , , R Kiros , , , 等 . . Show, attend and tell: neural image caption generation with visual attention . . 2015 . . Int. Conf. on Machine Learning . . 2048 - - 2057 . . . .
Y Yang , , , Y Zhuang , , , F Wu , , , 等 . . Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval . . IEEE Trans. Multim. , , 2008 . . 10 ( ( 3 ): ): 437 - - 446 . . DOI: 10.1109/TMM.2008.917359 http://doi.org/10.1109/TMM.2008.917359 . .
Y Yang , , , CL Teo , , , H Daume , , , 等 . . Corpus-guided sentence generation of natural images . . 2011 . . Conf. on Empirical Methods in Natural Language Processing . . 444 - - 454 . . . .
Y Yang , , , F Nie , , , D Xu , , , 等 . . A multimedia retrieval framework based on semi-supervised ranking and relevance feedback . . IEEE Trans. Patt. Anal. Mach. Intell. , , 2012 . . 34 ( ( 4 ): ): 723 - - 742 . . DOI: 10.1109/TPAMI.2011.170 http://doi.org/10.1109/TPAMI.2011.170 . .
L Yuan , , , C Pan , , , S Ji , , , 等 . . Automated annotation of developmental stages of Drosophila embryos in images containing spatial patterns of expression . . Bioinformatics , , 2014 . . 30 ( ( 2 ): ): 266 - - 273 . . DOI: 10.1093/bioinformatics/btt648 http://doi.org/10.1093/bioinformatics/btt648 . .
X Zhai , , , Y Peng , , , J Xiao . . Learning cross-media joint representation with sparse and semi-supervised regularization . . IEEE Trans. Circ. Syst. Video Technol. , , 2014 . . 24 ( ( 6 ): ): 965 - - 978 . . . .
H Zhang , , , Y Yang , , , H Luan , , , 等 . . Start from scratch: towards automatically identifying, modeling, and naming visual attributes . . 2014a . . ACM Int. Conf. on Multimedia . . 187 - - 196 . . DOI: 10.1109/TCSVT.2013.2276704 http://doi.org/10.1109/TCSVT.2013.2276704 . .
H Zhang , , , J Yuan , , , X Gao , , , 等 . . Boosting cross-media retrieval via visual-auditory feature analysis and relevance feedback . . 2014b . . ACM Int. Conf. on Multimedia . . 953 - - 956 . . DOI: 10.1145/2647868.2654915 http://doi.org/10.1145/2647868.2654915 . .
H Zhang , , , X Shang , , , H Luan , , , 等 . . Learning from collective intelligence: feature learning using social images and tags . . ACM Trans. Multim. Comput. Commun. Appl. , , 2016 . . 13 ( ( 1 ): ): 1 DOI: 10.1145/2647868.2654975 http://doi.org/10.1145/2647868.2654975 . .
J Zhang , , , S Wang , , , Q Huang . . Location-based parallel tag completion for geo-tagged social image retrieval . . 2015 . . ACM Int. Conf. on Multimedia Retrieval . . 355 - - 362 . . DOI: 10.1145/2978656 http://doi.org/10.1145/2978656 . .
Y Zhu , , , C Zhang , , , C R , , , 等 . . Building a large-scale multimodal knowledge base system for answering visual queries . . 2015 . . arXiv:1507.05670 . . . .
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010802024621