FOLLOWUS
1School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
2Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
3Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
4School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu 611730, China
5Department of Electronic Engineering and Information Sciences, University of Science and Technology of China, Hefei 230027, China
6Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
7School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China
E-mail:yhtian@pku.edu.cn
E-mail:tjhuang@pku.edu.cn
纸质出版日期:2017-01,
收稿日期:2016-12-12,
录用日期:2016-12-26
Scan QR Code
田永鸿, 陈熙霖, 熊红凯, 等. AI2.0时代的类人与超人感知:研究综述与趋势展望[J]. 信息与电子工程前沿(英文), 2017,18(1):58-67.
YONG-HONG TIAN, XI-LIN CHEN, HONG-KAI XIONG, et al. Towards human-like and transhuman perception in AI 2.0: a review. [J]. Frontiers of information technology & electronic engineering, 2017, 18(1): 58-67.
田永鸿, 陈熙霖, 熊红凯, 等. AI2.0时代的类人与超人感知:研究综述与趋势展望[J]. 信息与电子工程前沿(英文), 2017,18(1):58-67. DOI: 10.1631/FITEE.1601804.
YONG-HONG TIAN, XI-LIN CHEN, HONG-KAI XIONG, et al. Towards human-like and transhuman perception in AI 2.0: a review. [J]. Frontiers of information technology & electronic engineering, 2017, 18(1): 58-67. DOI: 10.1631/FITEE.1601804.
感知是智能系统与现实世界的交互界面。如果没有复杂而灵活的感知能力,就不可能创造出高级的人工智能(Artificial intelligence
AI)系统。最近,潘云鹤院士提出了AI2.0的概念,其最重要的特征就是未来的AI系统应拥有类人甚至超人的智能感知能力。本文简要回顾了不同智能感知领域的研究现状,包括视觉感知、听觉感知、言语感知、感知信息处理与学习引擎等方面。在此基础上,论文对即将到来的AI 2.0时代智能感知领域需要大力研究发展的重点方向进行了展望,包括:(1)类人和超人的主动视觉;(2)自然声学场景的听知觉感知;(3)自然交互环境的言语感知及计算;(4)面向媒体感知的自主学习;(5)大规模感知信息处理与学习引擎;(6)城市全维度智能感知推理引擎。这些研究方向应在未来AI2.0的研究规划中进行重点布局。
Perception is the interaction interface between an intelligent system and the real world. Without sophisticated and flexible perceptual capabilities
it is impossible to create advanced artificial intelligence (AI) systems. For the next-generation AI
called 'AI 2.0'
one of the most significant features will be that AI is empowered with intelligent perceptual capabilities
which can simulate human brain's mechanisms and are likely to surpass human brain in terms of performance. In this paper
we briefly review the state-of-the-art advances across different areas of perception
including visual perception
auditory perception
speech perception
and perceptual information processing and learning engines. On this basis
we envision several R & D trends in intelligent perception for the forthcoming era of AI 2.0
including: (1) human-like and transhuman active vision; (2) auditory perception and computation in an actual auditory setting; (3) speech perception and computation in a natural interaction setting; (4) autonomous learning of perceptual information; (5) large-scale perceptual information processing and learning platforms; and (6) urban omnidirectional intelligent perception and reasoning engines. We believe these research directions should be highlighted in the future plans for AI 2.0.
智能感知主动视觉听觉感知言语感知自主学习
Intelligent perceptionActive visionAuditory perceptionSpeech perceptionAutonomous learning
D Amodei,,,R Anubhai,,,E Battenberg,,,等..Deep Speech 2: end-to-end speech recognition in English and Mandarin..2015..arXiv:1512.02595....
MF Bear,,,BW Connors,,,MA Paradiso..Neuroscience..2001..Lippincott Williams and Wilkins, Maryland..208..
J Bruna,,,S Mallat..Invariant scattering convolution networks..IEEE Trans. Patt. Anal. Mach. Intell.,,2013..35((8):):1872--1886..DOI:10.1109/TPAMI.2012.230http://doi.org/10.1109/TPAMI.2012.230..
E Cand s,,,J Romberg,,,T Tao..Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information..IEEE Trans. Inform. Theory,,2006..52((2):):489--509..DOI:10.1109/TIT.2005.862083http://doi.org/10.1109/TIT.2005.862083..
J Deng,,,W Dong,,,R Socher,,,等..ImageNet: a large-scale hierarchical image database..2009..IEEE Conf. on Computer Vision and Pattern Recognition..248--255..DOI:10.1109/CVPR.2009.5206848http://doi.org/10.1109/CVPR.2009.5206848..
M Duarte,,,M Davenport,,,D Takhar,,,等..Single-pixel imaging via compressive sampling..IEEE Signal Proc. Mag.,,2008..25((2):):83--91..DOI:10.1109/MSP.2007.914730http://doi.org/10.1109/MSP.2007.914730..
J Han,,,L Shao,,,D Xu,,,等..Enhanced computer vision with Microsoft Kinect sensor: a review..IEEE Trans. Cybern.,,2013..43((5):):1318--1334..DOI:10.1109/TCYB.2013.2265378http://doi.org/10.1109/TCYB.2013.2265378..
G Hinton,,,L Deng,,,D Yu,,,等..Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups..IEEE Signal Proc. Mag.,,2012..29((6):):82--97..DOI:10.1109/MSP.2012.2205597http://doi.org/10.1109/MSP.2012.2205597..
S Hochreiter,,,J Schmidhuber..Long short-term memory..Neur. Comput.,,1997..9((8):):1735--1780..DOI:10.1162/neco.1997.9.8.1735http://doi.org/10.1162/neco.1997.9.8.1735..
YZ Hou,,,LF Jiao..Survey of smart city construction study from home and abroad..Ind. Sci. Trib.,,2014..13((24):):94--97....
H Jiang,,,G Huang,,,P Wilford..Multi-view in lensless compressive imaging..Apsipa Trans. Signal Inform. Proc.,,2014..3((15):):1--10..DOI:10.1109/PCS.2013.6737678http://doi.org/10.1109/PCS.2013.6737678..
A Kadambi,,,R Whyte,,,A Bhandari,,,等..Coded time of flight cameras: sparse deconvolution to address multipath interference and recover time profiles..ACM Trans. Graph.,,2013..32((6):):1--10..DOI:10.1145/2508363.2508428http://doi.org/10.1145/2508363.2508428..
PV Kale,,,SD Sharma..A review of securing home using video surveillance..Int. J. Sci. Res.,,2014..3((5):):1150--1154....
KM Kendrick..Intelligent perception..Appl. Animal Behav. Sci.,,1998..57((3-4):):213--231..DOI:10.1016/S0168-1591(98)00098-7http://doi.org/10.1016/S0168-1591(98)00098-7..
S King..Measuring a decade of progress in text-to-speech..Loquens,,2014..1((1):):e006DOI:10.3989/loquens.2014.006http://doi.org/10.3989/loquens.2014.006..
A Krizhevsk,,,I Sutskever,,,G Hinton..ImageNet classification with deep convolutional neural networks..2012..Advances in Neural Information Processing Systems..1097--1105....
G Lacey,,,GW Taylor,,,S Areibi..Deep learning on FPGAs: past, present, and future..2016..arXiv:1602.04283....
Y LeCun,,,Y Bengio,,,G Hinton..Deep learning..Nature,,2015..521((7553):):436--444..DOI:10.1038/nature14539http://doi.org/10.1038/nature14539..
T Li,,,H Chang,,,M Wang,,,等..Crowded scene analysis: a survey..IEEE Trans. Circ. Syst. Video Technol.,,2015..25((3):):367--386..DOI:10.1109/TCSVT.2014.2358029http://doi.org/10.1109/TCSVT.2014.2358029..
ZH Ling,,,SY Kang,,,H Zen,,,等..Deep learning for acoustic modeling in parametric speech generation: a systematic review of existing techniques and future trends..IEEE Signal Proc. Mag.,,2015..32((3):):35--52..DOI:10.1109/MSP.2014.2359987http://doi.org/10.1109/MSP.2014.2359987..
RP Lippmann..Speech recognition by machines and humans..Speech Commun.,,1997..22((1):):1--15..DOI:10.1016/S0167-6393(97)00021-6http://doi.org/10.1016/S0167-6393(97)00021-6..
RY Litovsky,,,HS Colburn,,,WA Yost,,,等..The precedence effect..J. Acoust. Soc. Am.,,1999..1061633--1654..DOI:10.1121/1.427914http://doi.org/10.1121/1.427914..
A Mahendran,,,A Vedaldi..Understanding deep image representations by inverting them..2015..IEEE Int. Conf. on Computer Vision Pattern Recognition..5188--5196..DOI:10.1109/CVPR.2015.7299155http://doi.org/10.1109/CVPR.2015.7299155..
J Makhoul..A 50-year retrospective on speech and language processing..2016..Int. Conf. on Interspeech..1..
SL Mattys,,,MH Davis,,,AR Bradlow,,,等..Speech recognition in adverse conditions: a review..Lang. Cogn. Proc.,,2012..27953--978..DOI:10.1080/01690965.2012.705006http://doi.org/10.1080/01690965.2012.705006..
L McMackin,,,MA Herman,,,B Chatterjee,,,等..A high-resolution SWIR camera via compressed sensing..SPIE.,,2012..8353835303DOI:10.1117/12.920050http://doi.org/10.1117/12.920050..
V Mountcastle..An organizing principle for cerebral function: the unit model and the distributed system. In: Edelman GM, Mountcastle, V.B. (Eds.), The Mindful Brain,,1978..:Cambridge:MIT Press,,..
P Musialski,,,P Wonka,,,DG Aliaga,,,等..A survey of urban reconstruction..Comput. Graph. Forum,,2013..32((6):):146--177..DOI:10.1111/cgf.12077http://doi.org/10.1111/cgf.12077..
J Ngiam,,,A Khosla,,,M Kim,,,等..Multimodal deep learning..2011..28th In. Conf. on Machine Learning..689--696....
K Niwa,,,Y Koizumi,,,T Kawase,,,等..Pinpoint extraction of distant sound source based on DNN mapping from multiple beamforming outputs to prior SNR..2016..IEEE Int. Conf. on Acoustics, Speech and Signal Processing..435--439....
A Oord,,,S Dieleman,,,H Zen,,,等..WaveNet: a generative model for raw audio..2016..arXiv:1609.03499....
YH Pan..Heading toward artificial intelligence 2.0..Engineering,,2016..2((4):):409--413..DOI:10.1016/J.ENG.2016.04.018http://doi.org/10.1016/J.ENG.2016.04.018..
G Pratt,,,J Manzo..The DARPA robotics challenge..IEEE Robot. Autom. Mag.,,2013..20((2):):10--12..DOI:10.1109/MRA.2013.2255424http://doi.org/10.1109/MRA.2013.2255424..
FH Priano,,,RL Armas,,,CF Guerra..A model for the smart development of island territories..2016..Int. Conf. on Digital Government Research..465--474..DOI:10.1145/2912160.2912187http://doi.org/10.1145/2912160.2912187..
R Raina,,,A Battle,,,H Lee,,,等..Self-taught learning: transfer learning from unlabeled data..2007..24th Int. Conf. on Machine Learning..759--766..DOI:10.1145/1273496.1273592http://doi.org/10.1145/1273496.1273592..
EA Robinson,,,S Treitel..Principles of digital Wiener filtering..Geophys. Prospect.,,1967..15((3):):311--332..DOI:10.1111/j.1365-2478.1967.tb01793.xhttp://doi.org/10.1111/j.1365-2478.1967.tb01793.x..
R Roy,,,T Kailath..ESPRIT-estimation of signal parameters via rotational invariance techniques..IEEE Trans. Acoust. Speech Signal Process.,,1989..37((7):):984--995..DOI:10.1109/29.32276http://doi.org/10.1109/29.32276..
R Salakhutdinov,,,G Hinton..Deep Boltzmann machines..J. Mach. Learn. Res.,,2009..5448--455....
G Saon,,,HKJ Kuo,,,S Rennie,,,等..The IBM 2015 English conversational telephone speech recognition system..2015..arXiv:1505.05899....
F Seide,,,G Li,,,D Yu..Conversational speech transcription using context-dependent deep neural networks..2011..Int. Conf. on Interspeech..437--440....
H Soltau,,,G Saon,,,TN Sainath..Joint training of convolutional and nonconvolutional neural networks..2014..IEEE Int. Conf. on Acoustics, Speech and Signal Processing..5572--5576..DOI:10.1109/ICASSP.2014.6854669http://doi.org/10.1109/ICASSP.2014.6854669..
T Song,,,J Chen,,,DB Zhang,,,等..A sound source localization algorithm using microphone array with rigid body..2016..Int. Congress on Acoustics..1--8....
LR Suzuki..Data as Infrastructure for Smart Cities,,2015..:London, UK:University College London,,PhD Thesis..
R Tadano,,,A Pediredla,,,A Veeraraghavan..Depth selective camera: a direct, on-chip, programmable technique for depth selectivity in photography..2015..Int. Conf. on Computer Vision..3595--3603..DOI:10.1109/ICCV.2015.410http://doi.org/10.1109/ICCV.2015.410..
K Tokuda,,,Y Nankaku,,,T Toda,,,等..Speech synthesis based on hidden Markov models..Proc. IEEE,,2013..101((5):):1234--1252..DOI:10.1109/JPROC.2013.2251852http://doi.org/10.1109/JPROC.2013.2251852..
M Turk,,,A Pentland..Eigenfaces for recognition..J. Cogn. Neurosci.,,1991..3((1):):71--86..DOI:10.1162/jocn.1991.3.1.71http://doi.org/10.1162/jocn.1991.3.1.71..
K Vesel,,,A Ghoshal,,,L Burget,,,等..Sequence-discriminative training of deep neural networks..2013..Int. Conf. on Interspeech..2345--2349....
W Wang,,,S Xu,,,B Xu..First step towards end-to-end parametric TTS synthesis: generating spectral parameters with neural attention..2016..Int. Conf. on Interspeech..2243--2247..DOI:10.21437/Interspeech.2016-134http://doi.org/10.21437/Interspeech.2016-134..
W Xiong,,,J Droppo,,,X Huang,,,等..Achieving human parity in conversational speech recognition..2016..arXiv:1610.05256....
JP Zhang,,,FY Wang,,,KF Wang,,,等..Data-driven intelligent transportation systems: a survey..IEEE Trans. Intell. Transp. Syst.,,2011..12((4):):1624--1639..DOI:10.1109/TITS.2011.2158001http://doi.org/10.1109/TITS.2011.2158001..
L Zheng,,,Y Yang,,,AG Hauptmann..Person re-identification: past, present and future..2016..arXiv:1610. 02984....
关联资源
相关文章
相关作者
相关机构