Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Authors	T. Mihaylova V. Niculae A.F.T. Martins
Publication date	2020
Host editors	B. Webber T. Cohn Y. He Y. Liu
Book title	2020 Conference on Empirical Methods in Natural Language Processing
Book subtitle	EMNLP 2020 : proceedings of the conference : November 16-20, 2020
ISBN (electronic)	9781952148606
Event	2020 Conference on Empirical Methods in Natural Language Processing
Pages (from-to)	2186–2202
Publisher	Stroudsburg, PA: The Association for Computational Linguistics
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream learning objective. In this paradigm, we discover a principled motivation for both the straight-through estimator (STE) as well as the recently-proposed SPIGOT – a variant of STE for structured models. Our perspective leads to new algorithms in the same family. We empirically compare the known and the novel pulled-back estimators against the popular alternatives, yielding new insight for practitioners and revealing intriguing failure cases.
Document type	Conference contribution
Language	English
Published at	https://aclanthology.org/2020.emnlp-main.171/
Downloads	2020.emnlp-main.171 (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning