Leveraging active learning for ocean data quality assessment reducing labeling workload and addressing severe data imbalance challenges

Open Access
Authors
Publication date 10-2025
Journal International Journal of Data Science and Analytics
Volume | Issue number 20 | 5
Pages (from-to) 4777-4798
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
Oceanic research initiatives like Argo, GLOSS, and EMSO aim to enhance our understanding of the oceans and climate through extensive data collection. Maintaining the quality of collected data is essential for effective data analysis and real-world applications. While automated and semi-automated tests can provide real-time or near-real-time validation, thorough quality control still depends on operator review. Consequently, current Quality Control (QC) processes continue to be labor-intensive. Machine Learning (ML) methods, which can analyze vast amounts of data and learn complex patterns autonomously, offer significant potential for improving QC processes. However, challenges like severe data disproportion persist for ML approaches. This article proposes exploiting active learning (AL) to assist QC experts, reducing their workload by proactively selecting informative data points for labeling. Targeting the data distribution challenge, AL, coupled with imbalance-resilient classifiers, enhances model performance in recognizing erroneous data points. To mitigate the cold-start problem in AL, we propose outlier detection for initializing classifiers, significantly reducing annotation costs. Our approach is tested on data generated by 5 Argo floats, demonstrating its feasibility to lessen the labeling workload for experts and tackle significant data imbalance. Although the experiments are limited in scale, the findings indicate a promising outlook for using active learning in ocean data quality assessment, facilitating an effective semi-automated quality control framework.
Document type Article
Language English
Published at https://doi.org/10.1007/s41060-025-00751-w
Other links https://www.scopus.com/pages/publications/105002454070
Downloads
Permalink to this page
Back