An exploratory study on the behavior of different multiobjective algorithms in clustering

Kultzak, Adriano Francisco, 1991-

Visualizar/Abrir

R - D - ADRIANO FRANCISCO KULTZAK.pdf (4.654Mb)

Data

2020

Autor

Kultzak, Adriano Francisco, 1991-

Metadata

Mostrar registro completo

Resumo

Resumo: Algoritmos de clustering tradicionais tem dificuldades para encontrar padrões em conjuntos de dados que apresentam estruturas heterogêneas pois cada algoritmo de acordo com a heurística implementada possui um viés para identificação de determinados formatos de grupo. Buscando solucionar esse problema surge a técnica do ensemble de agrupamentos que consiste em combinar um conjunto diverso de partições produzidas por algoritmos baseados em diferentes critérios de clustering e obter uma partição consenso que melhor represente as diversas estruturas presentes nos dados. Outra abordagem para superar as limitações dos algoritmos tradicionais é o clustering multi-objetivo, que otimiza simultaneamente mais de uma função objetivo, aproveitando-se também dos diferentes critérios de clustering que essas funções expressam. Este estudo apresenta a abordagem jMocle, contruída sobre o framework multi-objetivo jMetal e sobre a técnica que alia clustering multi-objetivo e cluster ensemble, MOCLE. Os experimentos realizados com o jMocle comparam diferentes algoritmos multi-objetivo no contexto de clustering e avaliam a qualidade das soluções geradas considerando a evolução das soluções. Os experimentos avaliam também a qualidade das partições geradas de acordo com o índice Adjusted Rand e comparam resultados com os algoritmos estado-da-arte em clustering multi-objetivo MOCK e Delta-MOCK. Os resultados não evidenciam diferença estatística significativa, portanto não é possível afirmar haver um algoritmo multi-objetivo exclusivamente melhor para clustering. Tendo em vista a não identificação de evidência estatística de diferença entre os algoritmos multi-objetivo, então foi realizada uma análise qualitativa dos métodos MOCLE, MOCK e Delta-MOCK, considerando o desempenho dos algoritmos de acordo com os diferentes tipos de conjunto de dados utilizados. Nesta análise são exploradas características dos algoritmos como o impacto do procedimento de inicialização, cruzamento e mutação e a relevância do parâmetro do número de vizinhos na identificação de determinados tipos de estrutura nos dados.

Abstract: Traditional clustering algorithms have difficulties in finding patterns in datasets that have heterogeneous structures because every algorithm according to the implemented heuristic has a bias towards identifying particular cluster shapes. To solve this problem, the cluster ensemble technique consists of combining a diverse set of partitions produced by algorithms based on different clustering criteria and obtaining a consensus partition that best represents the various structures present in the data. Another approach that seeks to overcome the limitations of traditional algorithms is multi-objective clustering, which simultaneously optimizes more than one objective function, taking advantage of the different clustering criteria that these functions express. This study presents the jMocle approach, built over the jMetal multi-objective framework and over the technique that combines multi-objective clustering and cluster ensemble, MOCLE. The experiments performed with jMocle compare different multi-objective algorithms in the context of clustering and evaluate the quality of the solutions generated considering their evolution. The experiments also evaluate the quality of the generated partitions according to the Adjusted Rand index and compare results with the multi-objective state-of-the-art clustering algorithms MOCK and Delta-MOCK. The results do not show statistically significant differences, therefore it is not possible to say that there is a multi-objective algorithm exclusively better for clustering. Given the non-identification of evidence of statistical difference between the multi-objective algorithms, a qualitative analysis of the MOCLE, MOCK and Delta-MOCK methods was performed, considering the performance of the algorithms according to the different types of data sets used. In this analysis, characteristics of the algorithms are explored such as the impact of the initialization, crossover and mutation procedure and the relevance of the number of neighbors parameter in identifying certain types of structure in the data.

URI

https://hdl.handle.net/1884/73831

Collections

Dissertações [351]