Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data

  • Sebastian Klenk
  • Jürgen Dippon
  • Andre Burkovski
  • Gunther Heidemann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7377)


Many types of data we are facing today are non-vectorial. But most of the analysis techniques are based on vector spaces and heavily depend on the underlying vector space properties. In order to apply such vector space techniques to non-vectorial data, so far only highly specialized methods have been suggested. We present a uniform and general approach to construct vector spaces from non-vectorial data. For this we develop a procedure to map each data element in a special kind of coordinate space which we call redundant dictionary space (RDS). The mapped vector space elements can be added, scaled and analyzed like vectors and thus allows any vector space analysis techniques to be used with any kind of data. The only requirement is the existence of a suitable inner product kernel.


Vector Space Data Element Vectorial Data Heart Beat Color Histogram 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    The Reuters-21578, Distribution 1.0 test collection,, We are using the XML-encoded version of Reuters-21578 from Saturnino Luz,
  2. 2.
    Yahoo Finance,, is a website that profides programmatic access to financial data. The web service is documented in,
  3. 3.
    Akil, H., Martone, M.E., Van Essen, D.C.: Challenges and opportunities in mining neuroscience data. Science 331, 708–712 (2011)CrossRefGoogle Scholar
  4. 4.
    Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Statistical Software 25 (2008)Google Scholar
  5. 5.
    Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5, 49–58 (2003)CrossRefGoogle Scholar
  6. 6.
    Hajji, H.: Statistical analysis of network traffic for adaptive faults detection. IEEE Trans. Neural Networks 16(5), 1053–1063 (2005)CrossRefGoogle Scholar
  7. 7.
    Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Eur. Conf. Mach. Learn. (ECML), pp. 137–142. Springer, Berlin (1998)Google Scholar
  8. 8.
    Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)CrossRefGoogle Scholar
  9. 9.
    Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1, 102–350 (1998)Google Scholar
  10. 10.
    King, G.: Ensuring the data rich future of the social sciences. Science 331, 719–721 (2011)CrossRefGoogle Scholar
  11. 11.
    Kohonen, T.: Self-organizing maps, 3rd edn. Springer, Berlin (2001)zbMATHCrossRefGoogle Scholar
  12. 12.
    Krogh, A., Brown, M., Saira Mian, I., Sjander, K., Haussler, D.: Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)CrossRefGoogle Scholar
  13. 13.
    Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)zbMATHGoogle Scholar
  14. 14.
    Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)zbMATHCrossRefGoogle Scholar
  15. 15.
    Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the Usability of Hierarchical Representations for Interactively Labeling Large Image Data Sets. In: Jacko, J.A. (ed.) HCI International 2011, Part I. LNCS, vol. 6761, pp. 618–627. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Nayar, Murase, H.: Columbia object image library: COIL-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University (February 1996)Google Scholar
  17. 17.
    Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Emp. Meth. Nat. Lang. Proc. EMNLP, pp. 79–86 (2002)Google Scholar
  18. 18.
    Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Berlin (2005)Google Scholar
  19. 19.
    Rudin, W.: Functional analysis, 2nd edn. McGraw-Hill, Boston (1991)zbMATHGoogle Scholar
  20. 20.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Proc. AAAI 1998 Workshop on Learn. Text Cat. (1998)Google Scholar
  21. 21.
    Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Signal Process. 26(1), 43–49 (1978)zbMATHCrossRefGoogle Scholar
  22. 22.
    Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  23. 23.
    Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)CrossRefGoogle Scholar
  24. 24.
    Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)zbMATHGoogle Scholar
  25. 25.
    Vert, J.-P., Saigo, H., Akutsu, T.: Local Alignment Kernels for Biological Sequences, pp. 131–153. MIT Press, Cambridge (2004)Google Scholar
  26. 26.
    Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R.I., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)MathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Sebastian Klenk
    • 1
  • Jürgen Dippon
    • 2
  • Andre Burkovski
    • 1
  • Gunther Heidemann
    • 3
  1. 1.Visualization and Interactive Systems InstituteStuttgart UniversityStuttgartGermany
  2. 2.Institute of Stochastics and ApplicationsStuttgart UniversityStuttgartGermany
  3. 3.University of OsnabrückOsnabrückGermany

Personalised recommendations