Abstract
Semantic vector models generate high-dimensional vector representations of words from their occurrence statistics across large corpora of electronic text. In these models, an occurrence of a word or number is treated as a discrete event, including numerical measurements of continuous properties. Furthermore, the sequence in which words occur is often ignored. In earlier work we have developed approaches to address these limitations, using graded demarcator vectors to represent measured distances in high-dimensional space. This permits incorporation of continuous properties, such as the position of a character within a term or a year of birth, into semantic vector models. In this paper we extend this work by developing a novel representational approach for protein sequences, in which both the positions and the properties of the amino acid components of protein sequences are represented using graded vectors. Evaluation on a set of around 100,000 immunoglobulin receptor sequences derived from subjects recently infected with West Nile Virus (WNV) suggests that encoding positions and properties using graded vectors increases the similarity between immunoglobulin receptor sequences produced by cells from ancestral lines known to have developed in response to WNV, relative to those from other cell lines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A similar approach, with interpolation between random matrices rather than random vectors, has recently been proposed as a way to represent the positions of pixels within images [25].
- 2.
With binary vectors, superposition occcurs probabilistically - if \(D(\alpha )\) has a 1 as its first element and \(D(\omega )\) does not, \(D(p_1)\) is generated with a 0.8 probability of a one in this position.
- 3.
This corresponds to the cosine metric if binary vectors are treated as vectors in {1,−1} not {1,0}. For example, \(1-(2/4)*\mathrm{HD}(1110, 1111) = 0.5\), and cos((0.5, 0.5, 0.5, −0.5), (0.5, 0.5, 0.5, 0.5)) = 0.5 (with 0.5 for normalized vector components after division by \(\sqrt{4}\)).
References
Clark, S., Pulman, S.: Combining symbolic, distributional models of meaning. In: AAAI Spring Symposium: Quantum Interaction, pp. 52–55 (2007)
Widdows, D., Cohen, T.: Graded semantic vectors: an approach to representing graded quantities in generalized quantum models. In: Atmanspacher, H., Filk, T., Pothos, E. (eds.) QI 2015. LNCS, vol. 9535, pp. 231–244. Springer, Heidelberg (2016). doi:10.1007/978-3-319-28675-4_18
Bohm, D.: Quantum Theory. Prentice-Hall, New York (1951). Republished by Dover, 1989
Cohen, T., Widdows, D., Wahle, M., Schvaneveldt, R.: Orthogonality and orthography: introducing measured distance into semantic space. In: Atmanspacher, H., Haven, E., Kitto, K., Raine, D. (eds.) QI 2013. LNCS, vol. 8369, pp. 34–46. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54943-4_4
Kleinstein, S.H.: Getting started in computational immunology. PLoS Comput. Biol. 4(8), e1000128 (2008)
Benichou, J., Ben-Hamo, R., Louzoun, Y., Efroni, S.: Rep-seq: uncovering the immunological repertoire through next-generation sequencing. Immunology 135(3), 183–191 (2012)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Tsioris, K., Gupta, N.T., Ogunniyi, A.O., Zimnisky, R.M., Qian, F., Yao, Y., Wang, X., Stern, J.N., Chari, R., Briggs, A.W., et al.: Neutralizing antibodies against West Nile virus identified directly from human B cells by single-cell analysis and next generation sequencing. Integrative Biol. 7(12), 1587–1597 (2015)
Wu, Y.-C., Kipling, D., Leong, H.S., Martin, V., Ademokun, A.A., Dunn-Walters, D.K.: High-throughput immunoglobulin repertoire analysis distinguishes between human IgM memory and switched memory B-cell populations. Blood 116, 1070–1078 (2010)
Ganapathiraju, M.K., Klein-Seetharaman, J., Balakrishnan, N., Reddy, R.: Characterization of protein secondary structure. IEEE Signal Process. Mag. 21(3), 78–87 (2004)
Asgari, E., Mofrad, M.R.: Continuous distributed representation of biological sequences for deep proteomics and genomics. PloS One 10(11), e0141287 (2015)
Kanerva, P., Kristofersson, J., Holst, A.: Random indexing of text samples for latent semantic analysis. In: Proceedings of the 22nd Annual Conference of the Cognitive Science Society, vol. 1036 (2000)
Widdows, D., Cohen, T.: Reasoning with vectors: a continuous model for fast robust inference. Logic J. IGPL 23(2), jzu028 (2015)
Kanerva, P.: Sparse distributed memory. The MIT Press, Cambridge (1988)
Gayler, R.W.: Vector symbolic architectures answer Jackendoff’s challenges for cognitive neuroscience. In: Slezak, P. (ed.) ICCS/ASCS International Conference on Cognitive Science, University of New South Wales, Sydney, Australia, pp. 133–138, (2004)
Fodor, J.A., Pylyshyn, Z.W.: Connectionism and cognitive architecture: a critical analysis. Cognition 28(1–2), 3–71 (1988)
Smolensky, P.: Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artif. Intell. 46(1), 159–216 (1990)
Plate, T.A.: Holographic Reduced Representations: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford (2003)
Kanerva, P.: Binary spatter-coding of ordered k-tuples. In: Artificial Neural Networks–ICANN 1996, pp. 869–873 (1996)
Hannagan, T., Dupoux, E., Christophe, A.: Holographic string encoding. Cogn. Sci. 35(1), 79–118 (2011)
Davis, C.J., Bowers, J.S.: Contrasting five different theories of letter position coding: Evidence from orthographic similarity effects. J. Exp. Psychol. Hum. Percept. Perform. 32(3), 535 (2006)
Sahlgren, M., Holst, A., Kanerva, P.: Permutations as a means to encode order in word space. In: Proceedings of the 30th Annual Meeting of the Cognitive Science Society, CogSci 2008, 23–26 July, Washington D.C., USA (2008)
Jones, M.N., Kintsch, W., Mewhort, D.J.: High-dimensional semantic space accounts of priming. J. Mem. Lang. 55(4), 534–552 (2006)
Cox, G.E., Kachergis, G., Recchia, G., Jones, M.N.: Toward a scalable holographic word-form representation. Behav. Res. Methods 43(3), 602–615 (2011)
Gallant, S.I., Culliton, P.: Positional binding with distributed representations. In: ICIVC, Portsmouth, UK (2016)
Aerts, D., Czachor, M., De Moor, B.: Geometric analogue of holographic reduced representation. J. Math. Psychol. 53(5), 389–398 (2007)
Aerts, D., Czachor, M.: Quantum aspects of semantic analysis and symbolic artificial intelligence. J. Phys. A Math. Gen. 37, L123–L132 (2004)
Kanerva, P.: Hyperdimensional computing: an introduction to computing in distributed representation with high-dimensional random vectors. Cogn. Comput. 1(2), 139–159 (2009)
Crick, F., Barnett, L., Brenner, S., Watts-Tobin, R.J.: General nature of the genetic code for proteins. Nature 192, 1227–1232 (1961). Macmillan Journals Limited
Lefranc, M.-P., Pommié, C., Ruiz, M., Giudicelli, V., Foulquier, E., Truong, L., Thouvenin-Contet, V., Lefranc, G.: IMGT unique numbering for immunoglobulin and T cell receptor variable domains and IG superfamily V-like domains. Dev. Comp. Immunol. 27(1), 55–77 (2003)
Tsioris, K., Gupta, N.T., Ogunniyi, A.O., Zimnisky, R.M., Qian, F., Yao, Y., Wang, X., Stern, J.N.H., Chari, R., Briggs, A.W., Clouser, C.R., Vigneault, F., Church, G.M., Garcia, M.N., Murray, K.O., Montgomery, R.R., Kleinstein, S.H., Love, J.C.: Neutralizing antibodies against West Nile virus identified directly from human B cells by single-cell analysis and next generation sequencing. Integr. Biol. 7(12), 1587–1597 (2015)
Gupta, N.T., Heiden, J.A.V., Uduman, M., Gadala-Maria, D., Yaari, G., Kleinstein, S.H.: Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data: table 1. Bioinformatics 31, 3356–3358 (2015)
Ogunniyi, A.O., Thomas, B.A., Politano, T.J., Varadarajan, N., Landais, E., Poignard, P., Walker, B.D., Kwon, D.S., Love, J.C.: Profiling human antibody responses by integrated single-cell analysis. Vaccine 32, 2866–2873 (2014)
Wahle, M., Widdows, D., Herskovic, J.R., Bernstam, E.V., Cohen, T.: Deterministic binary vectors for efficient automated indexing of medline/pubmed abstracts. In: AMIA Annual Symposium Proceedings, American Medical Informatics Association, vol. 2012, p. 940 (2012)
Widdows, D., Cohen, T.: The semantic vectors package: new algorithms and public tools for distributional semantics. In: Fourth IEEE International Conference on Semantic Computing (ICSC) (2010)
Acknowledgments
This research was supported by NIH/BD2K supplement R01LM011563-S1 and NIH/BD2K supplement R01AI104739-S1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cohen, T., Widdows, D., Heiden, J.A.V., Gupta, N.T., Kleinstein, S.H. (2017). Graded Vector Representations of Immunoglobulins Produced in Response to West Nile Virus. In: de Barros, J., Coecke, B., Pothos, E. (eds) Quantum Interaction. QI 2016. Lecture Notes in Computer Science(), vol 10106. Springer, Cham. https://doi.org/10.1007/978-3-319-52289-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-52289-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-52288-3
Online ISBN: 978-3-319-52289-0
eBook Packages: Computer ScienceComputer Science (R0)