Tailed U-Net: Multi-Scale Music Representation Learning

Open Access
Authors
Publication date 2022
Host editors
  • P. Rao
  • H. Murthy
  • A. Srinivasamurthy
  • R. Bittner
  • R. Caro Repetto
  • M. Goto
  • X. Serra
  • M. Miron
Book title Proceedings of the 23rd International Society for Music Information Retrieval Conference
Book subtitle Bengaluru, India, December 04-08, 2022
ISBN (electronic)
  • 9781732729926
Event 23rd International Society for Music Information Retrieval Conference
Pages (from-to) 67-75
Number of pages 9
Publisher ISMIR
Organisations
  • Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract
Self-supervised learning has steadily been gaining traction in recent years. In music information retrieval (MIR), one promising recent application of self-supervised learning is the CLMR framework (contrastive learning of musical representations). CLMR has shown good performance, achieving results on par with state-of-the-art end-to-end classification models, but it is strictly an encoding framework. It suffers the characteristic limitation of any encoder that it cannot explicitly combine multi-timescale information, whereas a characteristic feature of human audio perception is that we tend to perceive all frequencies simultaneously. To this end, we propose a generalization of CLMR that learns to extract and explicitly combine representations across different frequency resolutions, which we coin the tailed U-Net (TUNe). TUNe architectures combine multi-timescale information during a decoding phase, similar to U-Net architectures used in computer vision and source separation, but have a tail added to reduce sample-level information to a smaller pre-defined number of representation dimensions. The size of the decoding phase is a hyperparameter, and in the case of a zero-layer decoding phase, TUNe reduces to CLMR. The best TUNe architectures, however, require less training time to match CLMR performance, have superior transfer learning performance, and are competitive with state-of-the-art models even at dramatically reduced dimensionalities.
Document type Conference contribution
Language English
Published at https://doi.org/10.5281/zenodo.7316596
Other links https://ismir2022program.ismir.net/poster_109.html https://www.ismir.net/conferences/ismir2022.html
Downloads
000007 (Final published version)
Permalink to this page
Back