TY - JOUR
T1 - A model-based clustering of expectation–maximization and K-means algorithms in crime hotspot analysis
AU - Appiah, Simon Kojo
AU - Wirekoh, Kingsley
AU - Aidoo, Eric Nimako
AU - Oduro, Samuel Dua
AU - Arthur, Yarhands Dissou
N1 - Funding Information:
This research received no external funding. Thanks to the anonymous reviewers and all editors in the process of revision
Publisher Copyright:
© 2022 The Author(s). This open access article is distributed under a Creative Commons Attribution (CC-BY) 4.0 license.
PY - 2022/12/31
Y1 - 2022/12/31
N2 - Hotspot analysis of spatial attributes is a persistent research field in data mining, and applying a model-based clustering procedure is increasingly becoming popular in identifying trends and patterns in datasets on crime events occurring in space. The distributions of potential crime hotspots are parameterized as arising from Gaussian multivariate distributions, whose parameters are estimated by the expectation–maximization (E-M) algorithm, an iterative process with convergence very sensitive to initializations. In this study, a model-based clustering algorithm is explored from the E-M algorithm, initialized by K-means clustering using geodesic distance classification to estimate the model parameters and compared with the classical E-M algorithm, initialized with hierarchical clustering, to identify the distributional patterns of incidence of criminal activities. These model-based clustering algorithms were demonstrated on an open-source large dataset of violent crime activities, which occurred in West Midlands County. Training the data as a Gaussian process, the study identified 12 hotspots of Gaussian mixed models as clusters of an ellipsoidal distribution varying in shape, volume, and orientation, which are mostly found in central parts of boroughs of the study area. The proposed model-based clustering of the E-M algorithm combined with K-means clustering algorithm proved efficient as being fast and stable in convergence with low probability of uncertainty by classifications, producing same classification in some cases when compared to that of the classical E-M and K-means algorithms. The combined model-based clustering techniques applied in the hotspot analysis of criminal activities in space will not only provide insight into crime prediction and resource allocation in combating strategies but also guide researchers to adopt mechanisms for modeling large spatial attributes in data mining.
AB - Hotspot analysis of spatial attributes is a persistent research field in data mining, and applying a model-based clustering procedure is increasingly becoming popular in identifying trends and patterns in datasets on crime events occurring in space. The distributions of potential crime hotspots are parameterized as arising from Gaussian multivariate distributions, whose parameters are estimated by the expectation–maximization (E-M) algorithm, an iterative process with convergence very sensitive to initializations. In this study, a model-based clustering algorithm is explored from the E-M algorithm, initialized by K-means clustering using geodesic distance classification to estimate the model parameters and compared with the classical E-M algorithm, initialized with hierarchical clustering, to identify the distributional patterns of incidence of criminal activities. These model-based clustering algorithms were demonstrated on an open-source large dataset of violent crime activities, which occurred in West Midlands County. Training the data as a Gaussian process, the study identified 12 hotspots of Gaussian mixed models as clusters of an ellipsoidal distribution varying in shape, volume, and orientation, which are mostly found in central parts of boroughs of the study area. The proposed model-based clustering of the E-M algorithm combined with K-means clustering algorithm proved efficient as being fast and stable in convergence with low probability of uncertainty by classifications, producing same classification in some cases when compared to that of the classical E-M and K-means algorithms. The combined model-based clustering techniques applied in the hotspot analysis of criminal activities in space will not only provide insight into crime prediction and resource allocation in combating strategies but also guide researchers to adopt mechanisms for modeling large spatial attributes in data mining.
KW - clusters
KW - crime hotspot
KW - expectation–maximization (E-M) algorithm
KW - Gaussian mixed models (GMMs)
KW - initialization
KW - K-means algorithm
KW - Model-based clustering
UR - http://www.scopus.com/inward/record.url?scp=85140470811&partnerID=8YFLogxK
U2 - 10.1080/27684830.2022.2073662
DO - 10.1080/27684830.2022.2073662
M3 - Article
AN - SCOPUS:85140470811
SN - 2768-4830
VL - 9
JO - Research in Mathematics
JF - Research in Mathematics
IS - 1
M1 - 2073662
ER -