Effectiveness of In-Context Learning for Due Diligence A Reproducibility Study of Identifying Passages for Due Diligence
| Authors | |
|---|---|
| Publication date | 2025 |
| Journal | Information Retrieval Research Journal |
| Volume | Issue number | 1 | 2 |
| Pages (from-to) | 221-245 |
| Organisations |
|
| Abstract |
In recent years, Information Retrieval (IR) has evolved from ad hoc document retrieval to passage and answer retrieval, incorporating downstream Natural Language Processing (NLP). This led to remarkable progress in models when evaluated on early precision, yet at the same time, the potential to improve recall aspects has received less attention. This paper investigates an extremely high-recall task by a reproducibility study on a massive collection of merger and acquisition documents in due diligence passage retrieval. We have replicated previous work using Conditional Random Fields (CRF) and introduced a Python version of the effective CRFsuite approach. In addition, we explore the utility of open-source and closed-source Large Language Models (LLMs) with zero-shot and few-shot learning techniques on 50 different due diligence topics. Our findings reveal the potential for few-shot learning in due diligence, delivering acceptable levels of performance in terms of recall, marking an essential step towards developing advanced due diligence models that minimize the dependency on extensive training data typically required by domain-specific IR and NLP models. More generally, our results are an important first step toward developing advanced due diligence models for any legal information need.
|
| Document type | Article |
| Language | English |
| Published at | https://doi.org/10.54195/IRRJ.22626 |
| Downloads |
Effectiveness of In-Context Learning for Due Diligence
(Final published version)
|
| Permalink to this page | |
