Unsupervised software clustering is the problem of automatically decomposing the software system into meaningful units. Some approaches solely rely on the structure of the system, such as the module dependency graph, to decompose the software systems into cohesive groups of modules. Other techniques focus on the informal knowledge hidden within the source code itself to retrieve the modular architecture of the system. However both techniques in the case of large systems fail to produce decompositions that correspond to the actual architecture of the system. To overcome this problem, we propose a novel approach to clustering software systems by incorporating knowledge from different viewpoints of the system, such as the knowledge embedded within the source code as well as the structural dependencies within the system, to produce a clustering. In this setting, we adopt a search-based approach to the encoding of multi-view clustering and investigate two approaches to tackle this problem, one based on a linear combination of objectives into a single objective, the other a multi-objective approach to clustering. We evaluate our approach against a set of substantial software systems. The two approaches are evaluated on a dataset comprising of 10 Java open source projects. Finally, we propose two techniques based on interpolation and hierarchical clustering to combine different results obtained to yield a single result for single-objective and multi-objective encodings, respectively.
|Title of host publication||22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER 2015)|
|Publication status||Published - 9 Apr 2015|