Abstract
Many types of data we are facing today are non-vectorial. But most of the analysis techniques are based on vector spaces and heavily depend on the underlying vector space properties. In order to apply such vector space techniques to non-vectorial data, so far only highly specialized methods have been suggested. We present a uniform and general approach to construct vector spaces from non-vectorial data. For this we develop a procedure to map each data element in a special kind of coordinate space which we call redundant dictionary space (RDS). The mapped vector space elements can be added, scaled and analyzed like vectors and thus allows any vector space analysis techniques to be used with any kind of data. The only requirement is the existence of a suitable inner product kernel.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
The Reuters-21578, Distribution 1.0 test collection, http://www.daviddlewis.com/resources/testcollections/reuters21578 , We are using the XML-encoded version of Reuters-21578 from Saturnino Luz, http://modnlp.berlios.de/reuters21578.html
Yahoo Finance, http://finance.yahoo.com/ , is a website that profides programmatic access to financial data. The web service is documented in, http://code.google.com/p/yahoo-finance-managed/wiki/YahooFinanceAPIs
Akil, H., Martone, M.E., Van Essen, D.C.: Challenges and opportunities in mining neuroscience data. Science 331, 708–712 (2011)
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Statistical Software 25 (2008)
Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5, 49–58 (2003)
Hajji, H.: Statistical analysis of network traffic for adaptive faults detection. IEEE Trans. Neural Networks 16(5), 1053–1063 (2005)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Eur. Conf. Mach. Learn. (ECML), pp. 137–142. Springer, Berlin (1998)
Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)
Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1, 102–350 (1998)
King, G.: Ensuring the data rich future of the social sciences. Science 331, 719–721 (2011)
Kohonen, T.: Self-organizing maps, 3rd edn. Springer, Berlin (2001)
Krogh, A., Brown, M., Saira Mian, I., Sjander, K., Haussler, D.: Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the Usability of Hierarchical Representations for Interactively Labeling Large Image Data Sets. In: Jacko, J.A. (ed.) HCI International 2011, Part I. LNCS, vol. 6761, pp. 618–627. Springer, Heidelberg (2011)
Nayar, Murase, H.: Columbia object image library: COIL-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University (February 1996)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Emp. Meth. Nat. Lang. Proc. EMNLP, pp. 79–86 (2002)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Berlin (2005)
Rudin, W.: Functional analysis, 2nd edn. McGraw-Hill, Boston (1991)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Proc. AAAI 1998 Workshop on Learn. Text Cat. (1998)
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Signal Process. 26(1), 43–49 (1978)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)
Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)
Vert, J.-P., Saigo, H., Akutsu, T.: Local Alignment Kernels for Biological Sequences, pp. 131–153. MIT Press, Cambridge (2004)
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R.I., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Klenk, S., Dippon, J., Burkovski, A., Heidemann, G. (2012). Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-31488-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31487-2
Online ISBN: 978-3-642-31488-9
eBook Packages: Computer ScienceComputer Science (R0)