Skip to main content

Removing Statistical Biases in Unsupervised Sequence Learning

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Abstract

Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include statistical analysis and frequency based methods. We empirically compare these approaches and find that both approaches suffer from biases toward shorter sequences, and from inability to group together multiple instances of the same pattern. We provide methods to address these deficiencies, and evaluate them extensively on several synthetic and real-world data sets. The results show significant improvements in all learning methods used.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bauer, M.: From interaction data to plan libraries: A clustering approach. In: IJCAI 1999, pp. 962–967. Morgan-Kaufman Publishers, Inc, San Francisco (1999)

    Google Scholar 

  2. Lane, T., Brodley, C.E.: Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security 2, 295–331 (1999)

    Article  Google Scholar 

  3. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)

    Chapter  Google Scholar 

  4. Kaminka, G.A., Fidanboylu, M., Chang, A., Veloso, M.: Learning the sequential coordinated behavior of teams from observations. In: Kaminka, G.A., Lima, P.U., Rojas, R. (eds.) RoboCup 2002. LNCS (LNAI), vol. 2752, pp. 111–125. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Howe, A.E., Cohen, P.R.: Understanding planner behavior. AIJ 76, 125–166 (1995)

    Google Scholar 

  6. Sokal, R.R., Rohlf, F.J.: Biometry: The Principles and Practice of Statistics in Biological Research. W.H. Freeman and Co, New York (1981)

    MATH  Google Scholar 

  7. Cohen, P., Adams, N.: An algorithm for segmenting categorical time series into meaningful episodes. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, p. 198. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  8. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 32–41. ACM Press, New York (2002)

    Chapter  Google Scholar 

  9. Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery 2, 39–68 (1998)

    Article  Google Scholar 

  10. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Peckham, J. (ed.) SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, pp. 255–264. ACM Press, New York (1997)

    Chapter  Google Scholar 

  11. Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms 1, 359–373 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  12. Hettich, S., Bay, S.D.: The uci kdd archive (1999), http://kdd.ics.uci.edu/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Horman, Y., Kaminka, G.A. (2005). Removing Statistical Biases in Unsupervised Sequence Learning. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_15

Download citation

  • DOI: https://doi.org/10.1007/11552253_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28795-7

  • Online ISBN: 978-3-540-31926-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics