Skip to main content

Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data

  • Conference paper
Advances in Data Mining. Applications and Theoretical Aspects (ICDM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7377))

Included in the following conference series:

  • 1378 Accesses

Abstract

Many types of data we are facing today are non-vectorial. But most of the analysis techniques are based on vector spaces and heavily depend on the underlying vector space properties. In order to apply such vector space techniques to non-vectorial data, so far only highly specialized methods have been suggested. We present a uniform and general approach to construct vector spaces from non-vectorial data. For this we develop a procedure to map each data element in a special kind of coordinate space which we call redundant dictionary space (RDS). The mapped vector space elements can be added, scaled and analyzed like vectors and thus allows any vector space analysis techniques to be used with any kind of data. The only requirement is the existence of a suitable inner product kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. The Reuters-21578, Distribution 1.0 test collection, http://www.daviddlewis.com/resources/testcollections/reuters21578 , We are using the XML-encoded version of Reuters-21578 from Saturnino Luz, http://modnlp.berlios.de/reuters21578.html

  2. Yahoo Finance, http://finance.yahoo.com/ , is a website that profides programmatic access to financial data. The web service is documented in, http://code.google.com/p/yahoo-finance-managed/wiki/YahooFinanceAPIs

  3. Akil, H., Martone, M.E., Van Essen, D.C.: Challenges and opportunities in mining neuroscience data. Science 331, 708–712 (2011)

    Article  Google Scholar 

  4. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Statistical Software 25 (2008)

    Google Scholar 

  5. Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5, 49–58 (2003)

    Article  Google Scholar 

  6. Hajji, H.: Statistical analysis of network traffic for adaptive faults detection. IEEE Trans. Neural Networks 16(5), 1053–1063 (2005)

    Article  Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Eur. Conf. Mach. Learn. (ECML), pp. 137–142. Springer, Berlin (1998)

    Google Scholar 

  8. Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)

    Article  Google Scholar 

  9. Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1, 102–350 (1998)

    Google Scholar 

  10. King, G.: Ensuring the data rich future of the social sciences. Science 331, 719–721 (2011)

    Article  Google Scholar 

  11. Kohonen, T.: Self-organizing maps, 3rd edn. Springer, Berlin (2001)

    Book  MATH  Google Scholar 

  12. Krogh, A., Brown, M., Saira Mian, I., Sjander, K., Haussler, D.: Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)

    Article  Google Scholar 

  13. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    MATH  Google Scholar 

  14. Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)

    Article  MATH  Google Scholar 

  15. Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the Usability of Hierarchical Representations for Interactively Labeling Large Image Data Sets. In: Jacko, J.A. (ed.) HCI International 2011, Part I. LNCS, vol. 6761, pp. 618–627. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  16. Nayar, Murase, H.: Columbia object image library: COIL-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University (February 1996)

    Google Scholar 

  17. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Emp. Meth. Nat. Lang. Proc. EMNLP, pp. 79–86 (2002)

    Google Scholar 

  18. Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Berlin (2005)

    Google Scholar 

  19. Rudin, W.: Functional analysis, 2nd edn. McGraw-Hill, Boston (1991)

    MATH  Google Scholar 

  20. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Proc. AAAI 1998 Workshop on Learn. Text Cat. (1998)

    Google Scholar 

  21. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Signal Process. 26(1), 43–49 (1978)

    Article  MATH  Google Scholar 

  22. Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  23. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)

    Article  Google Scholar 

  24. Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)

    MATH  Google Scholar 

  25. Vert, J.-P., Saigo, H., Akutsu, T.: Local Alignment Kernels for Biological Sequences, pp. 131–153. MIT Press, Cambridge (2004)

    Google Scholar 

  26. Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R.I., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Klenk, S., Dippon, J., Burkovski, A., Heidemann, G. (2012). Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31488-9_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31487-2

  • Online ISBN: 978-3-642-31488-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics