Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction

Gueniche, Ted; Fournier-Viger, Philippe; Tseng, Vincent S.

doi:10.1007/978-3-642-53917-6_16

Ted Gueniche²⁵,
Philippe Fournier-Viger²⁵ &
Vincent S. Tseng²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8347))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3212 Accesses
38 Citations

Abstract

Predicting the next item of a sequence over a finite alphabet has important applications in many domains. In this paper, we present a novel prediction model named CPT (Compact Prediction Tree) which losslessly compress the training data so that all relevant information is available for each prediction. Our approach is incremental, offers a low time complexity for its training phase and is easily adaptable for different applications and contexts. We compared the performance of CPT with state of the art techniques, namely PPM (Prediction by Partial Matching), DG (Dependency Graph) and All-K-th-Order Markov. Results show that CPT yield higher accuracy on most datasets (up to 12% more than the second best approach), has better training time than DG and PPM, and is considerably smaller than All-K-th-Order Markov.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arlitt, M., Jin, T.: A workload characterization study of the 1998 world cup web site. IEEE Network 14(3), 30–37 (2000)
Article Google Scholar
Cleary, J., Witten, I.: Data compression using adaptive coding and partial string matching. IEEE Trans. on Inform. Theory 24(4), 413–421 (1984)
MathSciNet Google Scholar
Deshpande, M., Karypis, G.: Selective Markov models for predicting Web page accesses. ACM Transactions on Internet Technology 4(2), 163–184 (2004)
Article Google Scholar
Fournier-Viger, P., Gueniche, T., Tseng, V.S.: Using Partially-Ordered Sequential Rules to Generate More Accurate Sequence Prediction. In: Zhou, S., Zhang, S., Karypis, G. (eds.) ADMA 2012. LNCS (LNAI), vol. 7713, pp. 431–442. Springer, Heidelberg (2012)
Chapter Google Scholar
Padmanabhan, V.N., Mogul, J.C.: Using Prefetching to Improve World Wide Web Latency. Computer Communications 16, 358–368 (1998)
Google Scholar
Domenech, J., de la Ossa, B., Sahuquillo, J., Gil, J.A., Pont, A.: A taxonomy of web prediction algorithms. Expert Systems with Applications (9) (2012)
Google Scholar
Papapetrou, P., Kollios, G., Sclaroff, S., Gunopulos, D.: Discovering Frequent Arrangements of Temporal Intervals. In: Proc. of the 5th IEEE International Conference on Data Mining, pp. 354–361 (2005)
Google Scholar
Pitkow, J., Pirolli, P.: Mining longest repeating subsequence to predict world wide web surfing. In: Proc. 2nd USENIX Symposium on Internet Technologies and Systems, Boulder, CO, pp. 13–25 (1999)
Google Scholar
Sun, R., Giles, C.L.: Sequence Learning: From Recognition and Prediction to Sequential Decision Making. IEEE Intelligent Systems 16(4), 67–70 (2001)
Article Google Scholar
Willems, F., Shtarkov, Y., Tjalkens, T.: The context-tree weighting method: Basic properties. IEEE Trans. on Information Theory 31(3), 653–664 (1995)
Article MathSciNet Google Scholar
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proc. 7th ACM Intern. Conf. on KDD, pp. 401–406 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Moncton, Canada
Ted Gueniche & Philippe Fournier-Viger
Dept. of Computer Science and Inf. Eng., National Cheng Kung University, Taiwan
Vincent S. Tseng

Authors

Ted Gueniche
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar
Vincent S. Tseng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, Edmonton, University of Alberta, T6G 2E8, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gueniche, T., Fournier-Viger, P., Tseng, V.S. (2013). Compact Prediction Tree: A Lossless Model for Accurate Sequence Prediction. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-53917-6_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics