Equivariant and coordinate independent convolutional networks A gauge field theory of neural networks
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 05-03-2024 |
| Number of pages | 500 |
| Organisations |
|
| Abstract |
In this thesis, Equivariant and Coordinate Independent Convolutional Networks, we develop a gauge theory of artificial neural networks for processing spatially structured data like images, audio, or videos. The standard neural network architecture for such data are convolutional networks, which are characterized by their position-independent inference. Generalizing whatever they learn over spatial locations, convolutional networks are substantially more data efficient and robust in comparison to non-convolutional models. This characteristic is especially important in domains like medical imaging, where training data is scarce.
The independence from spatial locations is formally captured by the networks’ translation group equivariance, i.e. their property to commute with translations of their input signals. We show that the convolutional network design is not only sufficient for translation equivariance but is actually a necessary condition – convolutions can therefore be derived by demanding the model’s equivariance. The first part of this thesis leverages this insight to define generalized convolutional networks which are equivariant under larger symmetry groups. Such models generalize their inference over additional geometric transformations, for instance, rotations or reflections of patterns in images. We demonstrate empirically that they exhibit a significantly enhanced data efficiency, convergence rate, and final performance in comparison to conventional convolutional networks. Our publicly available implementation found wide use in the research community. In the second part, we extend convolutional networks further to process signals on Riemannian manifolds. Beyond flat Euclidean images, this setting includes, e.g., spherical signals like global weather patterns on the earth’s surface, or signals on general surfaces like artery walls or the cerebral cortex. We show that convolution kernels on manifolds are required to be equivariant under local gauge transformations if the networks’ inference is demanded to be coordinate independent. The resulting coordinate independent networks are proven to be equivariant with respect to the manifolds’ global symmetries (isometries). Our objective is not to propose yet another equivariant network design for a narrow application domain, but to devise a unifying mathematical framework for convolutional networks. The last part of this thesis demonstrates the generality of our differential geometric formulation of convolutional networks by showing that is able to explain a vast number of equivariant network architectures from the literature. |
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |