Cochrane-auto: An Aligned Dataset for the Simplification of Biomedical Abstracts

Open Access
Authors
Publication date 2024
Host editors
  • M. Shardlow
  • H. Saggion
  • F. Alva-Manchego
  • M. Zampieri
  • K. North
  • S. Štajner
  • R. Stodden
Book title The Third Workshop on Text Simplification, Accessibility and Readability : proceedings of the workshop
Book subtitle TSAR 2024 : November 15, 2024
ISBN (electronic)
  • 9798891761766
Event 3rd Workshop on Text Simplification, Accessibility and Readability
Pages (from-to) 41-51
Number of pages 11
Publisher Kerrville, TX: Association for Computational Linguistics
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
The most reliable and up-to-date information on health questions is in the biomedical literature, but inaccessible due to the complex language full of jargon. Domain specific scientific text simplification holds the promise to make this literature accessible to a lay audience. Therefore, we create Cochrane-auto: a large corpus of pairs of aligned sentences, paragraphs, and abstracts from biomedical abstracts and lay summaries. Experiments demonstrate that a plan-guided simplification system trained on Cochrane-auto is able to outperform a strong baseline trained on unaligned abstracts and lay summaries. More generally, our freely available corpus complementing Newsela-auto and Wiki-auto facilitates text simplification research beyond the sentence-level and direct lexical and grammatical revisions.
Document type Conference contribution
Language English
Published at https://doi.org/10.18653/v1/2024.tsar-1.5
Downloads
2024.tsar-1.5 (Final published version)
Permalink to this page
Back