Skip to main content

A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements

  • Conference paper
Modeling Decisions for Artificial Intelligence (MDAI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5285))

Abstract

Similarity measures are usually used to compare items and identify pairs or groups of similar individuals. The similarity measure strongly depends on the type of values to compare. We have faced the problem of considering that the information of the individuals is a sequence of events (i.e. sequences of web pages visited by a certain user or the personal daily schedule). Some measures for numerical sequences exist, but very few methods consider sequences of categorical data. In this paper, we present a new similarity measure for sequences of categorical labels and compare it with the previous approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abul, O., Atzori, M., Bonchi, F., Giannotti, F.: Hiding sequences. In: ICDE Workshops, pp. 147–156. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  2. Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/

  3. Dietterich, T.G.: Machine learning for sequential data: A review. In: Caelli, T., Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002. LNCS, vol. 2396, pp. 15–30. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Dong, G., Pei, J.: Sequence Data Mining. Advances in Database Systems, vol. 33. Springer, US (2007)

    MATH  Google Scholar 

  5. Figueira, J., Greco, S., Ehrgott, M.: Multiple Criteria Decision Analysis: State of the Art Surveys. ISOR & MS, vol. 78. Springer, Heidelberg (2005)

    Book  MATH  Google Scholar 

  6. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. The Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers, San Francisco (2006)

    MATH  Google Scholar 

  7. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  8. Liao, T.W.: Clustering of time series data–a survey. Pattern Recognition 38(11), 1857–1874 (2005)

    Article  MATH  Google Scholar 

  9. Mount, D.W.: Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press (September 2004)

    Google Scholar 

  10. Nin, J., Torra, V.: Extending microaggregation procedures for time series protection. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H.S., Slowinski, R. (eds.) RSCTC 2006. LNCS (LNAI), vol. 4259, pp. 899–908. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLoS Computational Biology 3(8), e123+ (2007)

    Article  Google Scholar 

  12. Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Current Opinion in Structural Biology 15(3), 261–266 (2005)

    Article  MathSciNet  Google Scholar 

  13. Yang, J., Wang, W.: Cluseq: Efficient and effective sequence clustering. In: 19th International Conference on Data Engineering (ICDE 2003), vol. 00, p. 101 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gómez-Alonso, C., Valls, A. (2008). A Similarity Measure for Sequences of Categorical Data Based on the Ordering of Common Elements. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2008. Lecture Notes in Computer Science(), vol 5285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88269-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88269-5_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88268-8

  • Online ISBN: 978-3-540-88269-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics