Removing Statistical Biases in Unsupervised Sequence Learning

Horman, Yoav; Kaminka, Gal A.

doi:10.1007/11552253_15

Removing Statistical Biases in Unsupervised Sequence Learning

Yoav Horman²¹ &
Gal A. Kaminka²¹

Conference paper

1964 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Abstract

Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include statistical analysis and frequency based methods. We empirically compare these approaches and find that both approaches suffer from biases toward shorter sequences, and from inability to group together multiple instances of the same pattern. We provide methods to address these deficiencies, and evaluate them extensively on several synthetic and real-world data sets. The results show significant improvements in all learning methods used.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bauer, M.: From interaction data to plan libraries: A clustering approach. In: IJCAI 1999, pp. 962–967. Morgan-Kaufman Publishers, Inc, San Francisco (1999)
Google Scholar
Lane, T., Brodley, C.E.: Temporal sequence learning and data reduction for anomaly detection. ACM Transactions on Information and System Security 2, 295–331 (1999)
Article Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Yu, P.S., Chen, A.S.P. (eds.) Eleventh International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14. IEEE Computer Society Press, Los Alamitos (1995)
Chapter Google Scholar
Kaminka, G.A., Fidanboylu, M., Chang, A., Veloso, M.: Learning the sequential coordinated behavior of teams from observations. In: Kaminka, G.A., Lima, P.U., Rojas, R. (eds.) RoboCup 2002. LNCS (LNAI), vol. 2752, pp. 111–125. Springer, Heidelberg (2003)
Chapter Google Scholar
Howe, A.E., Cohen, P.R.: Understanding planner behavior. AIJ 76, 125–166 (1995)
Google Scholar
Sokal, R.R., Rohlf, F.J.: Biometry: The Principles and Practice of Statistics in Biological Research. W.H. Freeman and Co, New York (1981)
MATH Google Scholar
Cohen, P., Adams, N.: An algorithm for segmenting categorical time series into meaningful episodes. In: Hoffmann, F., Adams, N., Fisher, D., Guimarães, G., Hand, D.J. (eds.) IDA 2001. LNCS, vol. 2189, p. 198. Springer, Heidelberg (2001)
Chapter Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 32–41. ACM Press, New York (2002)
Chapter Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery 2, 39–68 (1998)
Article Google Scholar
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Peckham, J. (ed.) SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, May 13-15, 1997, pp. 255–264. ACM Press, New York (1997)
Chapter Google Scholar
Sellers, P.: The theory and computation of evolutionary distances: pattern recognition. Journal of Algorithms 1, 359–373 (1980)
Article MATH MathSciNet Google Scholar
Hettich, S., Bay, S.D.: The uci kdd archive (1999), http://kdd.ics.uci.edu/

Download references

Author information

Authors and Affiliations

The MAVERICK Group, Department of Computer Science, Bar-Ilan University, Israel
Yoav Horman & Gal A. Kaminka

Authors

Yoav Horman
View author publications
You can also search for this author in PubMed Google Scholar
Gal A. Kaminka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Horman, Y., Kaminka, G.A. (2005). Removing Statistical Biases in Unsupervised Sequence Learning. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_15

Download citation

DOI: https://doi.org/10.1007/11552253_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics