FOLLOWUS
Department of Computer Science and Technology, Ocean University of China, Qingdao 266100, China
Science and Information College, Qingdao Agricultural University, Qingdao 266109, China
Guo-qiang ZHONG, E-mail: gqzhong@ouc.edu.cn
纸质出版日期:2018-11,
收稿日期:2017-04-17,
修回日期:2018-11-27,
Scan QR Code
张琴, 仲国强, 董军宇. 一种基于锚点的谱聚类方法[J]. 信息与电子工程前沿(英文), 2018,19(11):1385-1396.
QIN ZHANG, GUO-QIANG ZHONG, JUN-YU DONG. An anchor-based spectral clustering method. [J]. Frontiers of information technology & electronic engineering, 2018, 19(11): 1385-1396.
张琴, 仲国强, 董军宇. 一种基于锚点的谱聚类方法[J]. 信息与电子工程前沿(英文), 2018,19(11):1385-1396. DOI: 10.1631/FITEE.1700262.
QIN ZHANG, GUO-QIANG ZHONG, JUN-YU DONG. An anchor-based spectral clustering method. [J]. Frontiers of information technology & electronic engineering, 2018, 19(11): 1385-1396. DOI: 10.1631/FITEE.1700262.
谱聚类是模式识别、机器学习和数据挖掘中最流行最重要的聚类方法之一。然而,高计算复杂度妨碍了谱聚类在大规模数据集的应用。对于具有
n
个样本的聚类问题,谱聚类需
O
(
n
3
)时间复杂度计算图拉普拉斯矩阵特征向量。为解决该问题,提出一种新的基于锚点谱聚类方法(anchor-based spectral clustering,ASC)。首先,在数据集中选择
m
(
m
≪
n
)个可以基本保持原始数据内在(流形)结构的锚点。然后,构造原始数据与锚点之间的映射矩阵,并证明该映射矩阵能保持数据的聚类结构。基于该映射矩阵,近似得到原始数据谱嵌入。ASC方法复杂度与数据集大小呈线性关系。将该方法与经典谱聚类方法和两种最新谱聚类加速方法,即能量迭代聚类(power iteration clustering,PIC)和基于地标聚类(landmark-based spectral clustering,LSC),在10个真实数据集上比较。实验结果表明,ASC算法比经典谱聚类算法具有更快聚类速度,在效率和有效性上与现有方法相当或优于现有方法。
Spectral clustering is one of the most popular and important clustering methods in pattern recognition
machine learning
and data mining. However
its high computational complexity limits it in applications involving truly large-scale datasets. For a clustering problem with
$$n$$
samples
it needs to compute the eigenvectors of the graph Laplacian with
$$O(n^3)$$
time complexity. To address this problem
we propose a novel method called anchor-based spectral clustering (ASC) by employing anchor points of data. Specifically
$$m$$
(
$$m \ll n$$
) anchor points are selected from the dataset
which can basically maintain the intrinsic (manifold) structure of the original data. Then a mapping matrix between the original data and the anchors is constructed. More importantly
it is proved that this data-anchor mapping matrix essentially preserves the clustering structure of the data. Based on this mapping matrix
it is easy to approximate the spectral embedding of the original data. The proposed method scales linearly relative to the size of the data but with low degradation of the clustering performance. The proposed method
ASC
is compared to the classical spectral clustering and two state-of-the-art accelerating methods
i.e.
power iteration clustering and landmark-based spectral clustering
on 10 real-world applications under three evaluation metrics. Experimental results show that ASC is consistently faster than the classical spectral clustering with comparable clustering performance
and at least comparable with or better than the state-of-the-art methods on both effectiveness and efficiency.
聚类谱聚类图拉普拉斯锚点
ClusteringSpectral clusteringGraph LaplacianAnchors
D Arthur, , , S Vassilvitskii. . K-means++: the advantages of careful seeding. . 18th Annual ACM-SIAM Symp on Discrete Algorithms, , 2007. . p.1027--1035. . DOI:10.1145/1283383.1283494http://doi.org/10.1145/1283383.1283494..
C Boutsidis, , , A Gittens, , , P Kambadur. . Spectral clustering via the power method—provably. . 32nd Int Conf on Machine Learning, , 2015. . p.40--48. . ..
XJ Chang, , , FP Nie, , , ZG Ma, , , 等. . A convex formulation for spectral shrunk clustering. . 29th AAAI Conf on Artificial Intelligence, , 2015. . p.2532--2538. . ..
WY Chen, , , YQ Song, , , HJ Bai, , , 等. . Parallel spectral clustering in distributed systems. . IEEE Trans Patt Anal Mach Intell, , 2011. . 33((3):):568--586. . DOI:10.1109/TPAMI.2010.88http://doi.org/10.1109/TPAMI.2010.88..
XL Chen, , , D Cai. . Large scale spectral clustering with landmark-based representation. . 25th AAAI Conf on Artificial Intelligence, , 2011. . p.313--318. . ..
DL Davies, , , DW Bouldin. . A cluster separation measure. . IEEE Trans Patt Anal Mach Intell, , 1979. . 1((2):):224--227. . DOI:10.1109/TPAMI.1979.4766909http://doi.org/10.1109/TPAMI.1979.4766909..
O Delalleau, , , Y Bengio, , , N Le Roux. . Efficient nonparametric function induction in semi-supervised learning. . 10th Int Workshop on Artificial Intelligence and Statistics, , 2005. . p.96--103. . ..
J Demšar. . Statistical comparisons of classifiers over multiple data sets. . J Mach Learn Res, , 2006. . 71--30. . ..
C Fowlkes, , , S Belongie, , , F Chung, , , 等. . Spectral grouping using the Nyström method. . IEEE Trans Patt Anal Mach Intell, , 2004. . 26((2):):214--225. . DOI:10.1109/TPAMI.2004.1262185http://doi.org/10.1109/TPAMI.2004.1262185..
HJ Jia, , , SF Ding, , , XZ Xu, , , 等. . The latest research progress on spectral clustering. . Neur Comput Appl, , 2014. . 24((7-8):):1477--1486. . DOI:10.1007/s00521-013-1439-2http://doi.org/10.1007/s00521-013-1439-2..
HZ Li, , , XG Hu, , , YJ Lin, , , 等. . A social tag clustering method based on common co-occurrence group similarity. . Front Inform Technol Electron Eng, , 2016. . 17((2):):122--134. . DOI:10.1631/FITEE.1500187http://doi.org/10.1631/FITEE.1500187..
JY Li, , , YJ Xia, , , ZY Shan, , , 等. . Scalable constrained spectral clustering. . IEEE Trans Knowl Data Eng, , 2015. . 27((2):):589--593. . DOI:10.1109/TKDE.2014.2356471http://doi.org/10.1109/TKDE.2014.2356471..
F Lin, , , WW Cohen. . Power iteration clustering. . 27th Int Conf on Machine Learning, , 2010. . p.655--662. . ..
JL Liu, , , C Wang, , , M Danilevsky, , , 等. . Large-scale spectral clustering on graphs. . 23rd Int Joint Conf on Artificial Intelligence, , 2013. . p.1486--1492. . ..
W Liu, , , JF He, , , SF Chang. . Large graph construction for scalable semi-supervised learning. . 27th Int Conf on Machine Learning, , 2010. . p.679--686. . ..
MN Luo, , , FP Nie, , , XJ Chang, , , 等. . Adaptive unsupervised feature selection with structure regularization. . IEEE Trans Neur Netw Learn Syst, , 2017. . 29((4):):944--956. . DOI:10.1109/TNNLS.2017.2650978http://doi.org/10.1109/TNNLS.2017.2650978..
R Mall, , , R Langone, , , JAK Suykens. . FURS: fast and unique representative subset selection retaining large-scale community structure. . Soc Netw Anal Min, , 2013a. . 3((4):):1075--1095. . DOI:10.1007/s13278-013-0144-6http://doi.org/10.1007/s13278-013-0144-6..
R Mall, , , R Langone, , , JAK Suykens. . Kernel spectral clustering for big data networks. . Entropy, , 2013b. . 15((5):):1567--1586. . DOI:10.3390/e15051567http://doi.org/10.3390/e15051567..
R Mall, , , V Jumutc, , , R Langone, , , 等. . Representative subsets for big data learning using K-NN graphs. . IEEE Int Conf on Big Data, , 2014. . p.37--42. . DOI:10.1109/BigData.2014.7004210http://doi.org/10.1109/BigData.2014.7004210..
AY Ng, , , MI Jordan, , , Y Weiss, , , 等. . On spectral clustering: analysis and an algorithm. . Advances in Neural Information Processing Systems, , 2002. . p.849--856. . ..
JB Shi, , , J Malik. . Normalized cuts and image segmentation. . IEEE Trans Patt Anal Mach Intell, , 2000. . 22((8):):888--905. . DOI:10.1109/34.868688http://doi.org/10.1109/34.868688..
YQ Song, , , WY Chen, , , HJ Bai, , , 等. . Parallel spectral clustering. In: Daelemans W, Goethals B, Morik K (Eds.), Machine Learning and Knowledge Discovery in Databases.. . Springer Berlin Heidelberg, , 2008. . p.374--389. . DOI:10.1007/978-3-540-87481-2_25http://doi.org/10.1007/978-3-540-87481-2_25..
F Tian, , , B Gao, , , Q Cui, , , 等. . Learning deep representations for graph clustering. . 28th AAAI Conf on Artificial Intelligence, , 2014. . p.1293--1299. . ..
U von Luxburg. . A tutorial on spectral clustering. . Stat Comput, , 2007. . 17((4):):395--416. . DOI:10.1007/s11222-007-9033-zhttp://doi.org/10.1007/s11222-007-9033-z..
L Wang, , , C Leckie, , , K Ramamohanarao, , , 等. . Approximate spectral clustering. In: Theeramunkong T, Kijsirikul B, Cercone N, et al. (Eds.), Advances in Knowledge Discovery and Data Mining.. . Springer Berlin Heidelberg, , 2009. . p.134--146. . DOI:10.1007/978-3-642-01307-2_15http://doi.org/10.1007/978-3-642-01307-2_15..
RK Xia, , , Y Pan, , , L Du, , , 等. . Robust multi-view spectral clustering via low-rank and sparse decomposition. . 28th AAAI Conf on Artificial Intelligence, , 2014. . p.2149--2155. . ..
T Xiang, , , SG Gong. . Spectral clustering with eigenvector selection. . Patt Recog, , 2008. . 41((3):):1012--1029. . DOI:10.1016/j.patcog.2007.07.023http://doi.org/10.1016/j.patcog.2007.07.023..
P Xiao, , , ZY Li, , , S Guo, , , 等. . A K self-adaptive SDN controller placement for wide area networks. . Front Inform Technol Electron Eng, , 2016. . 17((7):):620--633. . DOI:10.1631/FITEE.1500350http://doi.org/10.1631/FITEE.1500350..
DH Yan, , , L Huang, , , MI Jordan. . Fast approximate spectral clustering. . 15th Int Conf on Knowledge Discovery and Data Mining, , 2009. . p.907--916. . DOI:10.1145/1557019.1557118http://doi.org/10.1145/1557019.1557118..
Y Yang, , , D Xu, , , FP Nie, , , 等. . Image clustering using local discriminant models and global integration. . IEEE Trans Image Process, , 2010. . 19((10):):2761--2773. . DOI:10.1109/TIP.2010.2049235http://doi.org/10.1109/TIP.2010.2049235..
Y Yang, , , HT Shen, , , FP Nie, , , 等. . Nonnegative spectral clustering with discriminative regularization. . 25th AAAI Conf on Artificial Intelligence, , 2011. . p.555--560. . ..
Y Yang, , , F Nie, , , D Xu, , , 等. . A multimedia retrieval framework based on semisupervised ranking and relevance feedback. . IEEE Trans Patt Anal Mach Intell, , 2012. . 34((4):):723--742. . DOI:10.1109/TPAMI.2011.170http://doi.org/10.1109/TPAMI.2011.170..
XC Zhang, , , LL Zong, , , QZ You, , , 等. . Sampling for Nystrom extension-based spectral clustering: incremental perspective and novel analysis. . ACM Trans Knowl Discov Data, , 2016. . 11((1):):1--25. . DOI:10.1145/2934693http://doi.org/10.1145/2934693..
XJ Zhu, , , J Lafferty. . Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. . 22nd Int Conf on Machine Learning, , 2005. . p.1052--1059. . DOI:10.1145/1102351.1102484http://doi.org/10.1145/1102351.1102484..
关联资源
相关文章
相关作者
相关机构