Combining Word Semantics within Complex Hilbert Space for Information Retrieval

  • Peter Wittek
  • Bevan Koopman
  • Guido Zuccon
  • Sándor Darányi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8369)

Abstract

Complex numbers are a fundamental aspect of the mathematical formalism of quantum physics. Quantum-like models developed outside physics often overlooked the role of complex numbers. Specifically, previous models in Information Retrieval (IR) ignored complex numbers. We argue that to advance the use of quantum models of IR, one has to lift the constraint of real-valued representations of the information space, and package more information within the representation by means of complex numbers. As a first attempt, we propose a complex-valued representation for IR, which explicitly uses complex valued Hilbert spaces, and thus where terms, documents and queries are represented as complex-valued vectors. The proposal consists of integrating distributional semantics evidence within the real component of a term vector; whereas, ontological information is encoded in the imaginary component. Our proposal has the merit of lifting the role of complex numbers from a computational byproduct of the model to the very mathematical texture that unifies different levels of semantic information. An empirical instantiation of our proposal is tested in the TREC Medical Record task of retrieving cohorts for clinical studies.

References

  1. 1.
    van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, New York (2004)CrossRefMATHGoogle Scholar
  2. 2.
    Song, D., Lalmas, M., van Rijsbergen, K., Frommholz, I., Piwowarski, B., Wang, J., Zhang, P., Zuccon, G., Bruza, P.D., Arafat, S., et al.: How quantum theory is developing the field of information retrieval. In: Proceedings of QI, Arlington, VA, USA, pp. 105–108, November 2010Google Scholar
  3. 3.
    Zuccon, G., Azzopardi, L.: Using the quantum probability ranking principle to rank interdependent documents. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 357–369. Springer, Heidelberg (2010)Google Scholar
  4. 4.
    Zuccon, G., Piwowarski, B., Azzopardi, L.: On the use of complex numbers in quantum models for information retrieval. In: Amati, G., Crestani, F. (eds.) ICTIR 2011. LNCS, vol. 6931, pp. 346–350. Springer, Heidelberg (2011) Google Scholar
  5. 5.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. JASIST 41(6), 391–407 (1990)CrossRefGoogle Scholar
  6. 6.
    Lund, K., Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Meth. Instrum. Comput. 28, 203–208 (1996)CrossRefGoogle Scholar
  7. 7.
    Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of CogSci., vol. 1036, Philadelphia, PA, USA (2000)Google Scholar
  8. 8.
    Karlgren, J., Sahlgren, M.: From words to understanding. In: Uesaka, Y., Kanerva, P., Asoh, H. (eds.) Foundations of Real-World Intelligence, pp. 294–308. CSLI Publications, Stanford (2001)Google Scholar
  9. 9.
    Sahlgren, M.: The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Institutionen för lingvistik. Department of Linguistics, Stockholm University (2006)Google Scholar
  10. 10.
    Symonds, M., Bruza, P., Sitbon, L., Turner, I.: Modelling word meaning using efficient tensor representations. In: Proceedings of PacLic., pp. 313–322, November 2011Google Scholar
  11. 11.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  12. 12.
    Koopman, B., Bruza, P., Sitbon, L., Lawley, M.: Towards semantic search and inference in electronic medical records: an approach using concept-based information retrieval. AMJ 9, 482–488 (2012)CrossRefGoogle Scholar
  13. 13.
    Wittgenstein, L.: Philosophical Investigations. Blackwell Publishing, Oxford (1967)Google Scholar
  14. 14.
    Harris, Z.: Distributional structure. In: Harris, Z. (ed.) Papers in Structural and Transformational Linguistics. Formal Linguistics, pp. 775–794. Humanities Press, New York (1970)CrossRefGoogle Scholar
  15. 15.
    Firth, J.R.: Papers in Linguistics 1934–1951. Oxford University Press, London (1957)Google Scholar
  16. 16.
    Bloomfield, L.: Language. Holt, Reinhart and Winston, New York (1933)Google Scholar
  17. 17.
    Morris Charles, W.: Signs, Language and Behavior. Prentice Hall, New York (1946)Google Scholar
  18. 18.
    von Uexküll, J.: The theory of meaning. Semiotica 42(1), 25–82 (1982)Google Scholar
  19. 19.
    Peirce, C.: Logic as semiotic: the theory of signs. In: Peirce, C., Buchler, J. (eds.) Philosophical Writings of Peirce, pp. 98–119. Dover Publications, New York (1955)Google Scholar
  20. 20.
    Frege, G.: Sense and reference. Philos. Rev. 57(3), 209–230 (1948)CrossRefGoogle Scholar
  21. 21.
    Sahlgren, M.: An introduction to random indexing. In: Proceedings of TKE, Copenhagen, Denmark (2005)Google Scholar
  22. 22.
    Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Proceedings LREC, Marrakech, Morocco, May 2008Google Scholar
  23. 23.
    Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: Graph-based concept weighting for medical information retrieval. In: Proceedings of ADCS, Dunedin, New Zealand, pp. 80–87, December 2012Google Scholar
  24. 24.
    Zuccon, G., Koopman, B., Nguyen, A., Vickers, D., Butt, L.: Exploiting medical hierarchies for concept-based information retrieval. In: Proceedings of ADCS, Dunedin, New Zealand, pp. 111–114, December 2012Google Scholar
  25. 25.
    Spackman, K.: SNOMED Clinical Terms Basics. International Health Terminology Standards Development Organisation Technical report, August 2008Google Scholar
  26. 26.
    Aronson, A.R., Lang, F.M.: An overview of MetaMap: historical perspective and recent advances. JAMIA 17(3), 229–236 (2010)Google Scholar
  27. 27.
    Wu, S.T., Liu, H., Li, D., Tao, C., Musen, M.A., Chute, C.G., Shah, N.H.: Unified medical language system term occurrences in clinical notes: a large-scale corpus analysis. JAMIA 19(e1), e149–e156 (2012)Google Scholar
  28. 28.
    Wittek, P., Tan, C.L.: Compactly supported basis functions as support vector kernels for classification. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 2039–2050 (2011)CrossRefGoogle Scholar
  29. 29.
    Widdows, D., Cohen, T.: Real, complex, and binary semantic vectors. In: Busemeyer, J.R., Dubois, F., Lambert-Mogiliansky, A., Melucci, M. (eds.) QI 2012. LNCS, vol. 7620, pp. 24–35. Springer, Heidelberg (2012)Google Scholar
  30. 30.
    Voorhees, E., Tong, R.: Overview of the TREC Medical Records Track. In: Proceedings of TREC, Gaithersburg, MD, USA, November 2011Google Scholar
  31. 31.
    Wu, S., Masanz, J., Ravikumar, K., Liu, H.: Three questions about clinical information retrieval. In: Proceedings of TREC, Gaithersburg, MD, USA, November 2012Google Scholar
  32. 32.
    Aerts, D., Czachor, M.: Quantum aspects of semantic analysis and symbolic artificial intelligence. J. Phys. A: Math. Gen. 37, L123–L132 (2004)CrossRefMATHMathSciNetGoogle Scholar
  33. 33.
    Bruza, P., Kitto, K., Ramm, B., Sitbon, L., Song, D., Blomberg, S.: Quantum-like non-separability of concept combinations, emergent associates and abduction. Logic J. IGPL 20(2), 445–457 (2012)CrossRefMATHMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Peter Wittek
    • 1
  • Bevan Koopman
    • 2
    • 3
  • Guido Zuccon
    • 2
  • Sándor Darányi
    • 1
  1. 1.University of BoråsBoråsSweden
  2. 2.Australian e-Health Research CentreCSIROBrisbaneAustralia
  3. 3.Queensland University of TechnologyBrisbaneAustralia

Personalised recommendations