No Labels? No Problem! Experiments with active learning strategies for multi-class classification in imbalanced low-resource settings
| Authors |
|
|---|---|
| Publication date | 2023 |
| Book title | Nineteenth International Conference on Artificial Intelligence and Law |
| Book subtitle | Proceedings of the Conference : Braga, Portugal, June 19-23, 2023, Universidade do Minho Law School |
| ISBN (electronic) |
|
| Event | 19th International Conference on Artificial Intelligence and Law, ICAIL 2023 |
| Pages (from-to) | 277-286 |
| Number of pages | 10 |
| Publisher | New York, New York: The Association for Computing Machinery |
| Organisations |
|
| Abstract |
Labeling textual corpora in their entirety is infeasible in most practical situations, yet it is a very common need today in public and private organizations. In contexts with large unlabeled datasets, active learning methods may reduce the manual labeling effort by selecting samples deemed more informative for the learning process. The paper elaborates on a method for multi-class classification based on state-of-the-art NLP active learning techniques, performing various experiments in low-resource and imbalanced settings. In particular, we refer to a dataset of Dutch legal documents constructed with two levels of imbalance; we study the performance of task-adapting a pre-trained Dutch language model, BERTje, and of using active learning to fine-tune the model to the task, testing several selection strategies. We find that, on the constructed datasets, an entropy-based strategy slightly improves the F1, precision, and recall convergence rates; and that the improvements are most pronounced in the severely imbalanced dataset. These results show promise for active learning in low-resource imbalanced domains but also leave space for further improvement. |
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.1145/3594536.3595171 |
| Other links | https://www.scopus.com/pages/publications/85177818812 |
| Downloads |
3594536.3595171
(Final published version)
|
| Permalink to this page | |
