TY - JOUR
T1 - An improved density peak clustering algorithm guided by pseudo labels
AU - Wang, Yizhang
AU - Pang, Wei
AU - Zhou, Jingchu
N1 - Funding Information:
This research is supported by the China Postdoctoral Science Foundation (Grant No. 2022M713064 ), Lvyangjinfeng Excellent Doctoral Program of Yangzhou (Grant No. YZLYJFJH2021YXBS105 ), Innovation and Entrepreneurship Program of Jiangsu Province, China (Grant No. JSSCBS20211048 ), and Jiangsu Provincial Universities of Natural Science General Program, China (Grant No. 21KJB520021 ).
Publisher Copyright:
© 2022
PY - 2022/9/27
Y1 - 2022/9/27
N2 - Density peak clustering algorithms and their variants have achieved promising results in many fields over the last few years. However, most of these algorithms parameters requiring to be fine-tuned by users. When facing real-world data without ground-truths, it is often challenging and time-consuming to identify better parameter values for parametric clustering algorithms. Considering this, we propose a density peak clustering algorithm guided by pseudo labels (PLDPC), in which the manually pre-specified parameters are avoided through applying the mutual information criterion. Specifically, we first design a novel pseudo-label generation method based on the theory of co-occurrence. Then, we use the maximizing mutual information method to obtain better clustering results. To evaluate the effectiveness of the proposed PLDPC algorithm, we conduct extensive experiments on 23 datasets, including six synthetic and seventeen real-world datasets. The experimental results show that PLDPC outperforms three classical algorithms (i.e., K-means, DPC, and DBSCAN) and eight state-of-the-art (SOTA) clustering algorithms in most cases.
AB - Density peak clustering algorithms and their variants have achieved promising results in many fields over the last few years. However, most of these algorithms parameters requiring to be fine-tuned by users. When facing real-world data without ground-truths, it is often challenging and time-consuming to identify better parameter values for parametric clustering algorithms. Considering this, we propose a density peak clustering algorithm guided by pseudo labels (PLDPC), in which the manually pre-specified parameters are avoided through applying the mutual information criterion. Specifically, we first design a novel pseudo-label generation method based on the theory of co-occurrence. Then, we use the maximizing mutual information method to obtain better clustering results. To evaluate the effectiveness of the proposed PLDPC algorithm, we conduct extensive experiments on 23 datasets, including six synthetic and seventeen real-world datasets. The experimental results show that PLDPC outperforms three classical algorithms (i.e., K-means, DPC, and DBSCAN) and eight state-of-the-art (SOTA) clustering algorithms in most cases.
KW - Density peak clustering
KW - Maximizing mutual information
KW - Pseudo labels
UR - http://www.scopus.com/inward/record.url?scp=85134602937&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2022.109374
DO - 10.1016/j.knosys.2022.109374
M3 - Article
SN - 0950-7051
VL - 252
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 109374
ER -