Abstract
Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We present a simple and practical algorithm (TDAG) for discrete sequence prediction. Based on a text-compression method, the TDAG algorithm limits the growth of storage by retaining the most likely prediction contexts and discarding (forgetting) less likely ones. The storage/speed tradeoffs are parameterized so that the algorithm can be used in a variety of applications. Our experiments verify its performance on data compression tasks and show how it applies to two problems: dynamically optimizing Prolog programs for good average-case behavior and maintaining a cache for a database on mass storage.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Abe, N., & Warmuth, M. (1990). On the computational complexity of approximating distributions by probabilistic automata. In Proceedings of the 3rd Workshop on Computational Learning Theory (pp. 52–66). San Mateo, CA: Morgan Kaufmann.
Bell, T.C., Cleary, J.G., & Witten, I.H. (1990). Text compression. Englewood Cliffs, NJ: Prentice Hall.
Blumer, A. (1990). Application of DAWGs to data compression. In A. Capocelli (Ed.), Sequences: Combinatorics, compression, security, and transmission (pp. 303–311). New York: Springer Verlag.
Dietterich, T., & Michalski, R. (1986). Learning to predict sequences. In R.S. Michalski et al. (Eds.), Machine learning: An AI approach, Vol. II. San Mateo, CA: Morgan Kaufmann.
Gooley, M., & Wah, B. (1989). Efficient reordering of Prolog programs. IEEE Transactions on Knowledge and Data Engineering, 1, 470–482.
Gratch, J. & DeJong, G. (1992). An analysis of learning to plan as a search problem. In Proceedings of the Ninth International Machine Learning Conference. Morgan Kaufmann.
Greiner, R., & Orponen, P. (1991). Probably approximately optimal derivation strategies. In Proceedings of the 2nd International Conference, Knowledge Representation and Reasoning (pp. 277–288). San Mateo, CA: Morgan Kaufmann.
Kotz, D., & Ellis, C.S. (1992). Practical prefetching techniques for multi-processor file systems. Distributed and Parallel Databases, 1, 33–51.
Laird, P., & Gamble, E. (1990). Extending EBG to term-rewriting systems. In Proceedings AAAI-90 (pp. 929–935). Menlo Park, CA: American Association for Artificial Intelligence.
Laird, P., & Saul, R. (1992). Predictive caching using the TDAG algorithm (Technical Report FIA-92-30). NASA Ames Research Center, AI Research Branch, Moffett Field, CA.
Laird, P. (1988). Efficient unsupervised learning. In D. Haussler & L. Pitt (Eds.), Proceedings of the 1st Computer Learning Theory Workshop (pp. 297–311). San Mateo, CA: Morgan Kaufmann.
Laird, P. (1992). Discrete sequence prediction and its applications. In Proceedings of the 10th National Conference on Artificial Intelligence (pp. 135–146). Menlo Park, CA: American Association for Artificial Intelligence.
Laird, P. (1992). Dynamic optimization. In Proceedings of the 9th International Machine Learning Conference (pp. 263–272). San Mateo, CA: Morgan Kaufmann.
Lau, E.J. (1982). Improving page prefetching with prior knowledge. Performance Evaluation, 2 (3), 195–206.
Lelewer, D., & Hirschberg, D.S. (1987). Data compression. ACM Computing Surveys, 19, 262–296.
Lelewer, D., & Hirschberg, D.S. (1991). Streamlining context models for data compression. In Proceedings, Data Compression Conference (pp. 313–322). Los Alamitos, CA: IEEE Press.
Levinson, J., Rabiner, L., & Sondhi, M. (1983). An introduction to the application of the theory of probabilistic functions of Markov processes in automatic speech recognition. Bell Systems Technical Journal, 62, 1035–1074.
Lindsay, R., Buchanan, B., et al. (1980). DENDRAL. New York: McGraw-Hill.
Martinez, M. (1982). Program behavior prediction and prepaging. Acta Informatica, 17, 101–120.
Norvig, P. (1991). Paradigms of A.I. programming: Case studies in common LISP. San Mateo, CA: Morgan Kaufmann.
Palmer, M., & Zdonik, S.B. (1991). Fido: a cache that learns to fetch. In Proceedings of 17th International Conference on Very Large Data Bases (pp. 255–264). San Mateo, CA: Morgan Kaufmann.
Prieditis, A., & Mostow, J. (1987). Prolearn: Towards a Prolog interpreter that learns. In Proceedings of the 6th National Conference on Artificial Intelligence (pp. 494–498). Menlo Park, CA: Morgan Kaufmann.
Salem, Kenneth. (1991). Adaptive prefetching for disk buffers (Technical Report Tr-91-46). University of Maryland and CESDIS, Goddard Space Flight Center, Greenbelt, MD.
Sejnowski, T., & Rosenberg, C. (1987). Parallel networks that learn to pronounce English text. Complex Systems, 1, 145–168.
Smith, A.J. (1978). Sequentiality and prefetching in database systems. Transactions on Database Systems, 3 (3), 223–247.
Subramanian, D., & Feldman, R. (1990). The utility of EBL in recursive domains. In Proceedings of the 8th National Conference on Artificial Intelligence (pp. 942–949). Menlo Park, CA: American Association for Artificial Intelligence.
Vitter, J., & Krishnan, P. (1991). Optimal prefetching via data compression. In Proceedings of the 32nd Annual IEEE Symposium on Foundations of Computer Science (pp. 71–78). New York: IEEE Press.
Williams, R. (1988). Dynamic history predictive compression. Information Systems, 13 (1), 129–140.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Laird, P., Saul, R. Discrete Sequence Prediction and Its Applications. Machine Learning 15, 43–68 (1994). https://doi.org/10.1023/A:1022661103485
Issue Date:
DOI: https://doi.org/10.1023/A:1022661103485