Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data

Klenk, Sebastian; Dippon, Jürgen; Burkovski, Andre; Heidemann, Gunther

doi:10.1007/978-3-642-31488-9_20

Sebastian Klenk²⁰,
Jürgen Dippon²¹,
Andre Burkovski²⁰ &
…
Gunther Heidemann²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7377))

Included in the following conference series:

Industrial Conference on Data Mining

1378 Accesses

Abstract

Many types of data we are facing today are non-vectorial. But most of the analysis techniques are based on vector spaces and heavily depend on the underlying vector space properties. In order to apply such vector space techniques to non-vectorial data, so far only highly specialized methods have been suggested. We present a uniform and general approach to construct vector spaces from non-vectorial data. For this we develop a procedure to map each data element in a special kind of coordinate space which we call redundant dictionary space (RDS). The mapped vector space elements can be added, scaled and analyzed like vectors and thus allows any vector space analysis techniques to be used with any kind of data. The only requirement is the existence of a suitable inner product kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

The Reuters-21578, Distribution 1.0 test collection, http://www.daviddlewis.com/resources/testcollections/reuters21578 , We are using the XML-encoded version of Reuters-21578 from Saturnino Luz, http://modnlp.berlios.de/reuters21578.html
Yahoo Finance, http://finance.yahoo.com/ , is a website that profides programmatic access to financial data. The web service is documented in, http://code.google.com/p/yahoo-finance-managed/wiki/YahooFinanceAPIs
Akil, H., Martone, M.E., Van Essen, D.C.: Challenges and opportunities in mining neuroscience data. Science 331, 708–712 (2011)
Article Google Scholar
Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure in r. J. Statistical Software 25 (2008)
Google Scholar
Gärtner, T.: A survey of kernels for structured data. SIGKDD Explor. Newsl. 5, 49–58 (2003)
Article Google Scholar
Hajji, H.: Statistical analysis of network traffic for adaptive faults detection. IEEE Trans. Neural Networks 16(5), 1053–1063 (2005)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Eur. Conf. Mach. Learn. (ECML), pp. 137–142. Springer, Berlin (1998)
Google Scholar
Kahn, S.D.: On the future of genomic data. Science 331, 728–729 (2011)
Article Google Scholar
Kaski, S., Kangas, J., Kohonen, T.: Bibliography of self-organizing map (SOM) papers: 1981–1997. Neural Comput. Surv. 1, 102–350 (1998)
Google Scholar
King, G.: Ensuring the data rich future of the social sciences. Science 331, 719–721 (2011)
Article Google Scholar
Kohonen, T.: Self-organizing maps, 3rd edn. Springer, Berlin (2001)
Book MATH Google Scholar
Krogh, A., Brown, M., Saira Mian, I., Sjander, K., Haussler, D.: Hidden markov models in computational biology: applications to protein modeling. J. Mol. Biol. 235, 1501–1531 (1994)
Article Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
MATH Google Scholar
Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Article MATH Google Scholar
Moehrmann, J., Bernstein, S., Schlegel, T., Werner, G., Heidemann, G.: Improving the Usability of Hierarchical Representations for Interactively Labeling Large Image Data Sets. In: Jacko, J.A. (ed.) HCI International 2011, Part I. LNCS, vol. 6761, pp. 618–627. Springer, Heidelberg (2011)
Chapter Google Scholar
Nayar, Murase, H.: Columbia object image library: COIL-100. Technical Report CUCS-006-96, Department of Computer Science, Columbia University (February 1996)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment classification using machine learning techniques. In: Proc. Conf. Emp. Meth. Nat. Lang. Proc. EMNLP, pp. 79–86 (2002)
Google Scholar
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, Berlin (2005)
Google Scholar
Rudin, W.: Functional analysis, 2nd edn. McGraw-Hill, Boston (1991)
MATH Google Scholar
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: Proc. AAAI 1998 Workshop on Learn. Text Cat. (1998)
Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Signal Process. 26(1), 43–49 (1978)
Article MATH Google Scholar
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neuroscience 3, 71–86 (1991)
Article Google Scholar
Vapnik, V.N.: Statistical learning theory. Wiley, New York (1998)
MATH Google Scholar
Vert, J.-P., Saigo, H., Akutsu, T.: Local Alignment Kernels for Biological Sequences, pp. 131–153. MIT Press, Cambridge (2004)
Google Scholar
Vishwanathan, S.V.N., Schraudolph, N.N., Kondor, R.I., Borgwardt, K.M.: Graph kernels. J. Mach. Learn. Res. 11, 1201–1242 (2010)
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Visualization and Interactive Systems Institute, Stuttgart University, Stuttgart, 70569, Germany
Sebastian Klenk & Andre Burkovski
Institute of Stochastics and Applications, Stuttgart University, Stuttgart, 70569, Germany
Jürgen Dippon
University of Osnabrück, Osnabrück, 49069, Germany
Gunther Heidemann

Authors

Sebastian Klenk
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Dippon
View author publications
You can also search for this author in PubMed Google Scholar
Andre Burkovski
View author publications
You can also search for this author in PubMed Google Scholar
Gunther Heidemann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Vision and Applied Computer Sciences, IBaI, Kohlenstraße 2, 04107, Leipzig, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Klenk, S., Dippon, J., Burkovski, A., Heidemann, G. (2012). Redundant Dictionary Spaces as a General Concept for the Analysis of Non-vectorial Data. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2012. Lecture Notes in Computer Science(), vol 7377. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31488-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-31488-9_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31487-2
Online ISBN: 978-3-642-31488-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics