From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions

Open Access
Authors
  • Nathanaël Carraz Rakotonirina
  • Mohammed Hamdy
  • Jon Ander Campos
  • Lucas Weber
Publication date 2025
Host editors
  • W. Che
  • J. Nabende
  • E. Shutova
  • M.T. Pilehvar
Book title The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025) : proceedings of the conference
Book subtitle ACL 2025 : July 27-August 1, 2025
ISBN (electronic)
  • 9798891762510
Event 63rd Annual Meeting of the Association for Computational Linguistics
Volume | Issue number 1
Pages (from-to) 19609-19642
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Large Language Models (LLMs) are increasingly used in working environments for a wide range of tasks, excelling at solving individual problems in isolation. However, are they also able to effectively collaborate over long-term interactions? To investigate this, we introduce MemoryCode, a synthetic multi-session dataset designed to test LLMs’ ability to track and execute simple coding instructions amid irrelevant information, simulating a realistic setting. While all the models we tested handle isolated instructions well, even the performance of state-of-the-art models like GPT-4o deteriorates when instructions are spread across sessions. Our analysis suggests this is due to their failure to retrieve and integrate information over long interaction chains. Our results highlight a fundamental limitation of current LLMs, restricting their ability to collaborate effectively in long interactions.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2025.acl-long.964
Downloads
2025.acl-long.964 (Final published version)
Permalink to this page
Back