The normalization of occurrence and Co-occurrence matrices in bibliometrics using Cosine similarities and Ochiai coefficients

Open Access
Authors
Publication date 11-2016
Journal Journal of the Association for Information Science and Technology
Volume | Issue number 67 | 11
Pages (from-to) 2805-2814
Organisations
  • Faculty of Social and Behavioural Sciences (FMG) - Amsterdam School of Communication Research (ASCoR)
Abstract
We prove that Ochiai similarity of the co-occurrence matrix is equal to cosine similarity in the underlying occurrence matrix. Neither the cosine nor the Pearson correlation should be used for the normalization of co-occurrence matrices because the similarity is then normalized twice, and therefore overestimated; the Ochiai coefficient can be used instead. Results are shown using a small matrix (5 cases, 4 variables) for didactic reasons, and also Ahlgren et al.'s (2003) co-occurrence matrix of 24 authors in library and information sciences. The overestimation is shown numerically and will be illustrated using multidimensional scaling and cluster dendograms. If the occurrence matrix is not available (such as in internet research or author cocitation analysis) using Ochiai for the normalization is preferable to using the cosine.
Document type Article
Language English
Published at https://doi.org/10.1002/asi.23603
Published at https://arxiv.org/abs/1503.08944
Downloads
1503.08944.pd (Accepted author manuscript)
Permalink to this page
Back