FOLLOWUS
Institute of Computer Graphics and Knowledge Visualisation, Graz University of Technology, Graz 8010, Austria
School of Computer Science and Engineering, Nanyang Technological University, Singapore 639798, Singapore
InfoVis Group, University of British Columbia, Vancouver V6T1Z4, Canada
Max Planck Institute for Meteorology, Hamburg 20146, Germany
Institute of Interactive Systems and Data Science, Graz University of Technology, Graz 8010, Austria
Mohammad CHEGINI, m.chegini@cgv.tugraz.at
[ "Jürgen BERNARD, E-mail: jubernar@cs.ubc.ca" ]
[ "Alexei SOURIN, E-mail: assourin@ntu.edu.sg" ]
[ "Keith ANDREWS, E-mail: kandrews@tugraz.at" ]
纸质出版日期:2020-04,
收稿日期:2019-10-06,
修回日期:2020-01-30,
Scan QR Code
Mohammad CHEGINI, Jürgen BERNARD, Jian CUI, 等. 交互式可视化标注与主动学习:实验比较[J]. 信息与电子工程前沿(英文), 2020,21(4):524-535.
CHEGINI MOHAMMAD, BERNARD JÜRGEN, CUI JIAN, et al. Interactive visual labelling versus active learning: an experimental comparison. [J]. Frontiers of information technology & electronic engineering, 2020, 21(4): 524-535.
Mohammad CHEGINI, Jürgen BERNARD, Jian CUI, 等. 交互式可视化标注与主动学习:实验比较[J]. 信息与电子工程前沿(英文), 2020,21(4):524-535. DOI: 10.1631/FITEE.1900549.
CHEGINI MOHAMMAD, BERNARD JÜRGEN, CUI JIAN, et al. Interactive visual labelling versus active learning: an experimental comparison. [J]. Frontiers of information technology & electronic engineering, 2020, 21(4): 524-535. DOI: 10.1631/FITEE.1900549.
监督式机器学习方法可自动分类新数据,且对数据分析非常有帮助。监督式机器学习的质量不仅依赖于使用的算法类型,也依赖于用于训练分类器的标注数据集的质量。训练数据集中的标注实例通常依赖于专业分析人员的手工选择与注释,且通常是一个单调与耗时的过程。标签可以在学习过程中为主动学习算法提供有用的输入,以自动确定数据实例的子集。交互式可视化标注技术是有前景的选择,它提供有效的视觉概览,分析人员可从中同时查看数据记录与选择项目标签。将分析人员置于循环中,生成的分类器可得到更高准确率。虽然交互式可视化标注技术的初步结果在某种意义上有前景的,考虑到用户标注可改善监督式学习,但是该技术的许多方面仍有待探索。本文使用mVis工具标注一个多元数据集以比较3种交互式可视化技术(相似图、散点矩阵与平行坐标图)以及主动学习。结果表明3种交互式可视化标注技术的分类准确率均高于主动学习算法,相对于散点矩阵与平行坐标图,用户主观上更偏爱使用相似图标注。用户也可以根据使用的可视化技术采用不同标注策略。
Methods from supervised machine learning allow the classification of new data automatically and are tremendously helpful for data analysis. The quality of supervised maching learning depends not only on the type of algorithm used
but also on the quality of the labelled dataset used to train the classifier. Labelling instances in a training dataset is often done manually relying on selections and annotations by expert analysts
and is often a tedious and time-consuming process. Active learning algorithms can automatically determine a subset of data instances for which labels would provide useful input to the learning process. Interactive visual labelling techniques are a promising alternative
providing effective visual overviews from which an analyst can simultaneously explore data records and select items to a label. By putting the analyst in the loop
higher accuracy can be achieved in the resulting classifier. While initial results of interactive visual labelling techniques are promising in the sense that user labelling can improve supervised learning
many aspects of these techniques are still largely unexplored. This paper presents a study conducted using the mVis tool to compare three interactive visualisations
similarity map
scatterplot matrix (SPLOM)
and parallel coordinates
with each other and with active learning for the purpose of labelling a multivariate dataset. The results show that all three interactive visual labelling techniques surpass active learning algorithms in terms of classifier accuracy
and that users subjectively prefer the similarity map over SPLOM and parallel coordinates for labelling. Users also employ different labelling strategies depending on the visualisation used.
交互式可视化标注主动学习可视分析
Interactive visual labellingActive learningVisual analytics
J Attenberg, , , F Provost. . Inactive learning: difficulties employing active learning in practice.. . ACM SIGKDD Explor Newslett, , 2010. . 12((2):):36--41. . DOI:10.1145/1964897.1964906http://doi.org/10.1145/1964897.1964906..
J Bernard, , , M Hutter, , , M Zeppelzauer, , , 等. . Comparing visual-interactive labeling with active learning: an experimental study. . IEEE Trans Vis Comput Graph, , 2018a. . 24((1):):298--308. . DOI:10.1109/TVCG.2017.2744818http://doi.org/10.1109/TVCG.2017.2744818..
J Bernard, , , M Zeppelzauer, , , M Lehmann, , , 等. . Towards user-centered active learning algorithms. . Comput Graph Forum, , 2018b. . 37((3):):121--132. . DOI:10.1111/cgf.13406http://doi.org/10.1111/cgf.13406..
J Bernard, , , M Zeppelzauer, , , M Sedlmair, , , 等. . VIAL: a unified process for visual interactive labeling. . Vis Comput, , 2018c. . 34((9):):1189--1207. . DOI:10.1007/s00371-018-1500-3http://doi.org/10.1007/s00371-018-1500-3..
CM Bishop. . Pattern Recognition and Machine Learning. . Springer, Berlin, Germany, , 2006. ..
D Ceneda, , , T Gschwandtner, , , T May, , , 等. . Characterizing guidance in visual analytics. . IEEE Trans Vis Comput Graph, , 2016. . 23((1):):111--120. . DOI:10.1109/TVCG.2016.2598468http://doi.org/10.1109/TVCG.2016.2598468..
M Chegini, , , L Shao, , , R Gregor, , , 等. . Interactive visual exploration of local patterns in large scatterplot spaces. . Comput Graph Forum, , 2018. . 37((3):):99--109. . DOI:10.1111/cgf.13404http://doi.org/10.1111/cgf.13404..
M Chegini, , , J Bernard, , , P Berger, , , 等. . Interactive labelling of a multivariate dataset for supervised machine learning using linked visualisations, clustering, and active learning. . Vis Inform, , 2019a. . 3((1):):9--17. . DOI:10.1016/j.visinf.2019.03.002http://doi.org/10.1016/j.visinf.2019.03.002..
M Chegini, , , J Bernard, , , L Shao, , , 等. . mVis in the wild: pre-study of an interactive visual machine learning system for labelling. . IEEE Vis 2019 Workshop on Evaluation of Interactive Visual Machine Learning Systems, , 2019b. . p.1--4. . ..
M Chegini, , , A Sourin, , , K Andrews, , , 等. . Eye-tracking based adaptive parallel coordinates. . 12th ACM SIGGRAPH Conf and Exhibition on Computer Graphics and Interactive Techniques in Asia, 44, , 2019c. . DOI:10.1145/3355056.3364563http://doi.org/10.1145/3355056.3364563..
A Culotta, , , A McCallum. . Reducing labeling effort for structured prediction tasks. . National Conf on Artificial Intelligence, , 2005. . p.746--751. . ..
M Hall, , , E Frank, , , G Holmes, , , 等. . The weka data mining software: an update. . ACM SIGKDD Explor Newslett, , 2009. . 11((1):):10--18. . DOI:10.1145/1656274.1656278http://doi.org/10.1145/1656274.1656278..
F Heimerl, , , S Koch, , , H Bosch, , , 等. . Visual classifier training for text document retrieval. . IEEE Trans Vis Comput Graph, , 2012. . 18((12):):2839--2848. . DOI:10.1109/TVCG.2012.277http://doi.org/10.1109/TVCG.2012.277..
TK Ho. . Random decision forests. . 3rd Int Conf on Document Analysis and Recognition, , 1995. . p.278--282. . DOI:10.1109/ICDAR.1995.598994http://doi.org/10.1109/ICDAR.1995.598994..
B Höferlin, , , R Netzel, , , M Höferlin, , , 等. . Inter-active learning of ad-hoc classifiers for video visual analytics. . IEEE Conf on Visual Analytics Science and Technology, , 2012. . p.23--32. . DOI:10.1109/VAST.2012.6400492http://doi.org/10.1109/VAST.2012.6400492..
A Inselberg. . The plane with parallel coordinates. . Vis Comput, , 1985. . 1((2):):69--91. . DOI:10.1007/BF01898350http://doi.org/10.1007/BF01898350..
I Jolliffe. . Principal Component Analysis. . Springer, New York, USA, , 2002. ..
D Kottke, , , A Calma, , , D Huseljic, , , 等. . Challenges of reliable, realistic and comparable active learning evaluation. . Proc Interactive Adaptive Learning Workshop, , 2017. . p.1--14. . ..
JB Kruskal. . Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. . Psychometrika, , 1964. . 29((1):):1--27. . DOI:10.1007/BF02289565http://doi.org/10.1007/BF02289565..
Y LeCun, , , L Bottou, , , Y Bengio, , , 等. . Gradient-based learning applied to document recognition. . Proc IEEE, , 1998. . 86((11):):2278--2324. . DOI:10.1109/5.726791http://doi.org/10.1109/5.726791..
L van der Maaten, , , G Hinton. . Visualizing data using t-SNE. . J Mach Learn Res, , 2008. . 9((2018):):2579--2605. . ..
T Scheffer, , , C Decomain, , , S Wrobel. . Active hidden Markov models for information extraction. . Int Conf on Advances in Intelligent Data Analysis, , 2001. . p.309--318. . ..
T Schreck, , , T von Landesberger, , , S Bremm. . Techniques for precision-based visual analysis of projected data. . Inform Vis, , 2010. . 9((3):):181--193. . DOI:10.1057/ivs.2010.2http://doi.org/10.1057/ivs.2010.2..
B Settles. . Active learning literature survey. . . Technical Report No. 1648, Department of Computer Sciences, University of Wisconsin-Madison, WI, USA., , 2009. ..
B Settles, , , M Craven. . An analysis of active learning strategies for sequence labeling tasks. . Proc Conf on Empirical Methods in Natural Language Processing, , 2008. . p.1070--1079. . ..
L Shao, , , A Mahajan, , , T Schreck, , , 等. . Interactive regression lens for exploring scatter plots. . Comput Graph Forum, , 2017. . 36((3):):157--166. . DOI:10.1111/cgf.13176http://doi.org/10.1111/cgf.13176..
Y Wu, , , I Kozintsev, , , JY Bouguet, , , 等. . Sampling strategies for active learning in personal photo retrieval. . IEEE Int Conf on Multimedia and Expo, , 2006. . p.529--532. . DOI:10.1109/ICME.2006.262442http://doi.org/10.1109/ICME.2006.262442..
关联资源
相关文章
相关作者
相关机构