Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Open Access
Authors
Publication date 2025
Host editors
  • InĂªs Lynce
  • Nello Murano
  • Mauro Vallati
  • Serena Villata
  • Federico Chesani
  • Michela Milano
  • Andrea Omicini
  • Mehdi Dastani
Book title ECAI 2025
Book subtitle 28th European Conference on Artificial Intelligence, 25-30 October2025, Bologna, Italy : including 14th Conference on Prestigious Applications of Intelligent Systems (PAIS 2025) : proceedings
ISBN (electronic)
  • 9781643686318
Series Frontiers in Artificial Intelligence and Applications
Event 28th European Conference on Artificial Intelligence, ECAI 2025, including 14th Conference on Prestigious Applications of Intelligent Systems, PAIS 2025
Pages (from-to) 2538-2545
Number of pages 8
Publisher Amsterdam: IOS Press
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract

Safe reinforcement learning (RL) is crucial for real-world applications, and multi-agent interactions introduce additional safety challenges. While Probabilistic Logic Shields (PLS) has been a powerful proposal to enforce safety in single-agent RL, their generalizability to multi-agent settings remains unexplored. In this paper, we address this gap by conducting extensive analyses of PLS within decentralized, multi-agent environments, and in doing so, propose Shielded Multi-Agent Reinforcement Learning (SMARL) as a general framework for steering MARL towards norm-compliant outcomes. Our key contributions are: (1) a novel Probabilistic Logic Temporal Difference (PLTD) update for shielded, independent Q-learning, which incorporates probabilistic constraints directly into the value update process; (2) a probabilistic logic policy gradient method for shielded PPO with formal safety guarantees for MARL; and (3) comprehensive evaluation across symmetric and asymmetrically shielded n-player game-theoretic benchmarks, demonstrating fewer constraint violations and significantly better cooperation under normative constraints. These results position SMARL as an effective mechanism for equilibrium selection, paving the way toward safer, socially aligned multi-agent systems.

Document type Conference contribution
Language English
Published at https://doi.org/10.48550/arXiv.2411.04867 https://doi.org/10.3233/FAIA251103
Other links http://adsabs.harvard.edu/abs/2024arXiv241104867C https://www.scopus.com/pages/publications/105024464849
Downloads
FAIA-413-FAIA251103 (Final published version)
Permalink to this page
Back