Unsupervised Detection of Anomalous Commits in Software Repositories

Open Access
Authors
Publication date 2025
Host editors
  • B. Bouqata
  • T. Seceleanu
  • Javier Berrocal
  • Kaoutar El Maghaouri
  • Yanchun Sun
Book title 2025 IEEE International Conference on Software Services Engineering : IEEE SSE 2025
Book subtitle Helsinki, Finland, 7-12 July 2025 : proceedings
ISBN
  • 9798331567903
ISBN (electronic)
  • 9798331567897
Event 2025 IEEE International Conference on Software Services Engineering, SSE 2025
Pages (from-to) 31-38
Number of pages 8
Publisher Los Alamitos, California: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Identifying anomalous commits is essential for maintaining software quality and reliability, as these anomalies can indicate potential issues in code, development practices, or repository management. Current anomaly detection methods typically rely on prede-fined rules or supervised learning, which suffer from limitations such as dependence on labeled datasets, rigid rule definitions, and high maintenance overhead in rapidly evolving repositories. This paper introduces a novel unsupervised framework for effectively detecting anomalous commits without requiring labeled data or rigid rules, providing a scalable and adaptable solution to enhance code quality in modern version control systems. To address the high-dimensional and mul-tifaceted nature of commit data, our approach com-bines dimensionality reduction techniques with tar-geted feature engineering, enhancing both precision and adaptability in anomaly detection. We systematically evaluate three state-of-the-art unsupervised techniques-Local Outlier Factor (LOF), Isolation Forest (IF), and Histogram-Based Outlier Score (HBOS)-across five diverse open-source repositories. Our results demonstrate that Isolation Forest achieves the highest detection accuracy, effectively balancing precision and recall while capturing both global and local anomalies. Additionally, expert validation confirms the practical relevance of our approach, providing insights into frequent and high-impact anomalies encountered in real-world repositories.

Document type Conference contribution
Language English
Published at https://doi.org/10.1109/SSE67621.2025.00013
Other links https://www.proceedings.com/82141.html https://www.scopus.com/pages/publications/105016017456
Downloads
Permalink to this page
Back