Post-clustering merging with novel metrics for multi-label image collections
| Authors | |
|---|---|
| Publication date | 01-09-2025 |
| Journal | Expert Systems With Applications |
| Article number | 127875 |
| Volume | Issue number | 288 |
| Number of pages | 11 |
| Organisations |
|
| Abstract |
This study addresses the task of clustering multi-label image collections, which is increasingly important in fields such as forensics, social media, and intelligence. Traditional classification models fall short in real-world scenarios where labeled data may not be available. Unsupervised clustering is a way to move forward in such cases. Clustering of multi-label data should minimize the number of clusters for an analyst to identify all instances of a specific label, ensuring cluster efficiency, while also reducing misplaced data within each cluster to improve cluster quality. Existing clustering algorithms applied to multi-label image collections generally have a strong emphasis on either cluster efficiency or cluster quality. We propose a Post-Clustering Merging algorithm that provides greater control over cluster efficiency vs quality in multi-label image collections, that can be applied on the results of existing clustering algorithms. We introduce two external metrics designed for multi-label clustering: Pairwise Jaccard Similarity Score and Label Distribution Score. These metrics enable a nuanced evaluation of clustering quality and efficiency, respectively, in scenarios where single-label metrics are inadequate. We demonstrate its effectiveness on various multi-label image collections. The results indicate significant improvements, not only giving more control, but also reducing the trade-off between cluster quality and efficiency. This study fills a gap in multi-label data collection analysis and sets a foundation for future exploration in this domain.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1016/j.eswa.2025.127875 |
| Other links | https://www.scopus.com/pages/publications/105006710465 |
| Downloads |
Post-clustering merging with novel metrics for multi-label image collections
(Final published version)
|
| Permalink to this page | |
