TY - JOUR
T1 - An Adaptive Mutual K-nearest Neighbors Clustering Algorithm based on Maximizing Mutual Information
AU - Wang, Yizhang
AU - Pang, Wei
AU - Jiao, Zhixiang
N1 - Funding Information:
This research is supported by the China Postdoctoral Science Foundation (Grant No. 2022M713064 ), Lvyangjinfeng Excellent Doctoral Program of Yangzhou (Grant No. YZLYJFJH2021YXBS105), Innovation and Entrepreneurship Program of Jiangsu Province (Grant No. JSSCBS20211048), and Jiangsu Provincial Universities of Natural Science General Program (Grant No. 21KJB520021).
Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2023/5
Y1 - 2023/5
N2 - Clustering based on Mutual K-nearest Neighbors (CMNN) is a classical method of grouping data into different clusters. However, it has two well-known limitations: (1) the clustering results are very much dependent on the parameter k; (2) CMNN assumes that noise points correspond to clusters of small sizes according to the Mutual K-nearest Neighbors (MKNN) criterion, but some data points in small size clusters are wrongly identified as noises. To address these two issues, we propose an adaptive improved CMNN algorithm (AVCMNN), which consists of two parts: (1) improved CMNN algorithm (abbreviated as VCMNN) and (2) adaptive VCMNN algorithm (abbreviated as AVCMNN). Specifically, the first part is VCMNN algorithm, we first reassign the data points in some small-size clusters by a novel voting strategy because some of them are wrongly identified as noise points, and the clustering results are improved. Then, the second part is AVCMNN, we use maximizing mutual information to construct an objective function to optimize the parameters of the proposed method and finally obtain the better parameters values and clustering results. We conduct extensive experiments on twenty datasets, including six synthetic datasets, ten UCI datasets, and four image datasets. The experimental results show that VCMNN and AVCMNN outperforms three classical algorithms (i.e., CMNN, DPC, and DBSCAN) and six state-of-the-art (SOTA) clustering algorithms in most cases.
AB - Clustering based on Mutual K-nearest Neighbors (CMNN) is a classical method of grouping data into different clusters. However, it has two well-known limitations: (1) the clustering results are very much dependent on the parameter k; (2) CMNN assumes that noise points correspond to clusters of small sizes according to the Mutual K-nearest Neighbors (MKNN) criterion, but some data points in small size clusters are wrongly identified as noises. To address these two issues, we propose an adaptive improved CMNN algorithm (AVCMNN), which consists of two parts: (1) improved CMNN algorithm (abbreviated as VCMNN) and (2) adaptive VCMNN algorithm (abbreviated as AVCMNN). Specifically, the first part is VCMNN algorithm, we first reassign the data points in some small-size clusters by a novel voting strategy because some of them are wrongly identified as noise points, and the clustering results are improved. Then, the second part is AVCMNN, we use maximizing mutual information to construct an objective function to optimize the parameters of the proposed method and finally obtain the better parameters values and clustering results. We conduct extensive experiments on twenty datasets, including six synthetic datasets, ten UCI datasets, and four image datasets. The experimental results show that VCMNN and AVCMNN outperforms three classical algorithms (i.e., CMNN, DPC, and DBSCAN) and six state-of-the-art (SOTA) clustering algorithms in most cases.
KW - Adaptive clustering
KW - Maximizing mutual information
KW - Mutual K-nearest neighbors
UR - http://www.scopus.com/inward/record.url?scp=85145666249&partnerID=8YFLogxK
U2 - 10.1016/j.patcog.2022.109273
DO - 10.1016/j.patcog.2022.109273
M3 - Article
SN - 0031-3203
VL - 137
JO - Pattern Recognition
JF - Pattern Recognition
M1 - 109273
ER -