Orthogonality and Orthography: Introducing Measured Distance into Semantic Space

  • Trevor CohenEmail author
  • Dominic Widdows
  • Manuel Wahle
  • Roger Schvaneveldt
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8369)


This paper explores a new technique for encoding structured information into a semantic model, for the construction of vector representations of words and sentences. As an illustrative application, we use this technique to compose robust representations of words based on sequences of letters, that are tolerant to changes such as transposition, insertion and deletion of characters. Since these vectors are generated from the written form or orthography of a word, we call them ‘orthographic vectors’. The representation of discrete letters in a continuous vector space is an interesting example of a Generalized Quantum model, and the process of generating semantic vectors for letters in a word is mathematically similar to the derivation of orbital angular momentum in quantum mechanics. The importance (and sometimes, the violation) of orthogonality is discussed in both mathematical settings. This work is grounded in psychological literature on word representation and recognition, and is also motivated by potential technological applications such as genre-appropriate spelling correction. The mathematical method, examples and experiments, and the implementation and availability of the technique in the Semantic Vectors package are also discussed.


Distributional semantics Orthographic similarity Vector Symbolic Architectures 



This research was supported by US National Library of Medicine grant R21 LM010826. We would like to thank Lance DeVine, for the CHRR implementation used in this research, and Tom Landauer for providing the TASA corpus.


  1. 1.
    Jones, M.N., Mewhort, D.J.K.: Representing word meaning and order information in a composite holographic lexicon. Psychol. Rev. 114, 1–37 (2007)CrossRefGoogle Scholar
  2. 2.
    Sahlgren, M., Holst, A.,Kanerva, P.: Permutations as a means to encode order in word space. In: Proceedings of the 30th Annual Meeting of the Cognitive Science Society (CogSci’08), July 23–26, Washington D.C., USA (2008)Google Scholar
  3. 3.
    Cox, G.E., Kachergis, G., Recchia, G., Jones, M.N.: Toward a scalable holographic word-form representation. Behav. Res. Methods 43, 602–615 (2011)CrossRefGoogle Scholar
  4. 4.
    Kachergis, G., Cox, G.E., Jones, M.N.: Orbeagle: integrating orthography into a holographic model of the lexicon. In: Honkela, T. (ed.) ICANN 2011, Part I. LNCS, vol. 6791, pp. 307–314. Springer, Heidelberg (2011) Google Scholar
  5. 5.
    Plate, T.A.: Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)Google Scholar
  6. 6.
    Cohen, T., Widdows, D.: Empirical distributional semantics: methods and biomedical applications. J. Biomed. Inf. 42, 390–405 (2009)CrossRefGoogle Scholar
  7. 7.
    Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37(1), 141–188 (2010)zbMATHMathSciNetGoogle Scholar
  8. 8.
    Landauer, T., Dumais, S.: A solution to plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997)CrossRefGoogle Scholar
  9. 9.
    Burgess, C., Livesay, K., Lund, K.: Explorations in context space: words, sentences, discourse. Discourse Process. 25, 211–257 (1998)CrossRefGoogle Scholar
  10. 10.
    De Vine, L., Bruza, P.: Semantic oscillations: Encoding context and structure in complex valued holographic vectors. In: Proceedings of AAAI Fall Symposium on Quantum Informatics for Cognitive Social, and Semantic Processes (2010)Google Scholar
  11. 11.
    Basile, P., Caputo, A., Semeraro, G.: Encoding syntactic dependencies by vector permutation. In: Proceedings of the EMNLP 2011 Workshop on GEometrical Models of Natural Language Semantics, GEMS, vol. 11, pp. 43–51 (2011)Google Scholar
  12. 12.
    Cohen, T., Widdows, D., Schvaneveldt, R., Rindflesch, T.: Finding schizophrenia’s prozac: emergent relational similarity in predication space. In: Song, D., Melucci, M., Frommholz, I., Zhang, P., Wang, L., Arafat, S. (eds.) QI 2011. LNCS, vol. 7052, pp. 48–59. Springer, Heidelberg (2011) Google Scholar
  13. 13.
    Gomez, P., Ratcliff, R., Perea, M.: The overlap model: a model of letter position coding. Psychol. Rev. 115, 577–600 (2008)CrossRefGoogle Scholar
  14. 14.
    Davis, C.J.: The spatial coding model of visual word identification. Psychol. Rev. 117(3), 713 (2010)CrossRefGoogle Scholar
  15. 15.
    Hannagan, T., Dupoux, E., Christophe, A.: Holographic string encoding. Cogn. Sci. 35(1), 79–118 (2011)CrossRefGoogle Scholar
  16. 16.
    Gayler, R.W.: Vector symbolic architectures answer jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.) ICCS/ASCS International Conference on Cognitive Science, pp. 133–138. University of New South Wales, Sydney (2004)Google Scholar
  17. 17.
    Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: Artificial Neural Networks – ICANN, vol. 96, pp. 869–873 (1996)Google Scholar
  18. 18.
    Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts. In: AMIA Annual Symposium Proceedings 2012, pp. 940–949, November 2012Google Scholar
  19. 19.
    Widdows, D., Peters, S.: Word vectors and quantum logic experiments with negation and disjunction. In: Proceedings of 8th Mathematics of Language Conference, Bloomington, Indiana (2003)Google Scholar
  20. 20.
    Cohen, T., Widdows, D., De Vine, L., Schvaneveldt, R., Rindflesch, T.C.: Many paths lead to discovery: analogical retrieval of cancer therapies. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 90–101. Springer, Heidelberg (2012)Google Scholar
  21. 21.
    Bohm, D.: Quantum Theory. Prentice-Hall, Upper Saddle River (1951). Republished by Dover (1989)Google Scholar
  22. 22.
    Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Sixth International Symposium on Quantum Interaction, France, Paris (2012)Google Scholar
  23. 23.
    Hannagan, T., Grainger, J.: Protein analysis meets visual word recognition: a case for string kernels in the brain. Cogn. Sci. 36(4), 575–606 (2012)CrossRefGoogle Scholar
  24. 24.
    Yarowsky, D., Wicentowski, R.: Minimally supervised morphological analysis by multimodal alignment. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, PA, USA, pp. 207–216. Association for Computational Linguistics, Stroudsburg (2000)Google Scholar
  25. 25.
    Baroni, M., Matiasek, J., Trost, H.: Unsupervised discovery of morphologically related words based on orthographic and semantic similarity. In: Proceedings of the ACL-02 workshop on Morphological and phonological learning -. MPL ’02, PA, USA, vol. 6, pp. 48–57. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  26. 26.
    Dennis, S.: Introducing word order within the LSA framework. In: Handbook of Latent Semantic Analysis (2007)Google Scholar
  27. 27.
    Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Sixth International Conference on Language Resources and Evaluation (LREC 2008) (2008)Google Scholar
  28. 28.
    Widdows, D., Cohen, T.: The semantic vectors package: New algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC), pp. 9–15 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Trevor Cohen
    • 1
    Email author
  • Dominic Widdows
    • 2
  • Manuel Wahle
    • 1
  • Roger Schvaneveldt
    • 3
  1. 1.University of Texas School of Biomedical Informatics at HoustonHoustonUSA
  2. 2.Microsoft BingBellevueUSA
  3. 3.Arizona State UniversityPhoenixUSA

Personalised recommendations