Identifying performance anomalies in fluctuating cloud environments A robust correlative-GNN-based explainable approach
| Authors |
|
|---|---|
| Publication date | 08-2023 |
| Journal | Future Generation Computer Systems |
| Volume | Issue number | 145 |
| Pages (from-to) | 77-86 |
| Organisations |
|
| Abstract |
Cloud computing provides scalable and elastic resources to customers as a low-cost, on-demand utility service. Multivariate time series anomaly detection is crucial to promise the overall performance of cloud computing systems. However, due to the complexity and high dynamics of cloud environments, anomaly detections caused by irregular fluctuations in data and the robustness of models are challenging. To address these issues, we propose a deep learning-based anomaly detection method for multivariate time series for real-world operational clouds: Correlative-GNN with Multi-Head Self-Attention and Auto-Regression Ensemble Method (CGNN-MHSA-AR). Our method utilizes two parallel graph neural networks (GNN) to learn the time and feature inter-dependencies to achieve fewer false positives. Our approach leverages a multi-head self-attention, GRU, and AR model to capture multiple-dimensional information, leading to better detection robustness. CGNN-MHSA-AR can also provide an abnormal explanation based on the prediction error of its constituent univariate series. We compare the detection performance of CGNN-MHSA-AR with seven baseline methods on seven public datasets. The evaluation shows that the proposed CGNN-MHSA-AR outperforms its competitors with an F1-Score of 0.871 on average and is 19.9% better than state-of-the-art baseline methods. In addition, CGNN-MHSA-AR also offers to correctly identify the root cause of detected anomalies with up to 74.1% accuracy.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.1016/j.future.2023.03.020 |
| Other links | https://www.scopus.com/pages/publications/85150903095 |
| Permalink to this page | |
