Quantifying Compositionality of Classic and State-of-the-Art Embeddings

Zhijin Guo; Chenhao Xue; Zhaozhen Xu; Hongbo Bo; Yuxuan Ye; Janet B. Pierrehumbert; M. Lewis

doi:https://doi.org/10.18653/v1/2025.findings-emnlp.1206

Quantifying Compositionality of Classic and State-of-the-Art Embeddings

Authors	Zhijin Guo Chenhao Xue Zhaozhen Xu Hongbo Bo Yuxuan Ye Janet B. Pierrehumbert M. Lewis
Publication date	2025
Host editors	C. Christodoulopoulos T. Chakraborty C. Rose V. Peng
Book title	The 2025 Conference on Empirical Methods in Natural Language Processing : Findings of EMNLP 2025
Book subtitle	EMNLP 2025 : November 4-9, 2025
ISBN (electronic)	9798891763357
Event	30th Conference on Empirical Methods in Natural Language Processing, EMNLP 2025
Pages (from-to)	22130–22146
Publisher	Kerrville, TX: Association for Computational Linguistics
Organisations	Interfacultary Research - Institute for Logic, Language and Computation (ILLC)
Abstract	For language models to generalize correctly to novel expressions, it is critical that they exploit access compositional meanings when this is justified. Even if we don’t know what a “pelp” is, we can use our knowledge of numbers to understand that “ten pelps” makes more pelps than “two pelps”. Static word embeddings such as Word2vec made strong, indeed excessive, claims about compositionality. The SOTA generative, transformer models and graph models, however, go too far in the other direction by providing no real limits on shifts in meaning due to context. To quantify the additive compositionality, we formalize a two-step, generalized evaluation that (i) measures the linearity between known entity attributes and their embeddings via canonical correlation analysis, and (ii) evaluates additive generalization by reconstructing embeddings for unseen attribute combinations and checking reconstruction metrics such as L2 loss, cosine similarity, and retrieval accuracy. These metrics also capture failure cases where linear composition breaks down. Sentences, knowledge graphs, and word embeddings are evaluated and tracked the compositionality across all layers and training stages. Stronger compositional signals are observed in later training stages across data modalities, and in deeper layers of the transformer-based model before a decline at the top layer. Code will be publicly available on GitHub upon acceptance.
Document type	Conference contribution
Note	With checklist
Language	English
Published at	https://doi.org/10.18653/v1/2025.findings-emnlp.1206
Other links	https://github.com/Zhijin-Guo1/quantifying-compositionality
Downloads	2025.findings-emnlp.1206 (Final published version)
Supplementary materials	2025.findings-emnlp.1206.checklist
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Quantifying Compositionality of Classic and State-of-the-Art Embeddings