Early Prediction of Temporal Sequences Based on Information Transfer
In recent years, early prediction for ongoing sequences has been more and more valuable in a large variety of time-critical applications which demand to classify an ongoing sequence in its early stage. There are two challenging issues in early prediction, i.e. why an ongoing sequence is early predictable and how to reasonably determine the parameter k optimal , the minimum number of elements that must be observed before an accurate classification can be made. To address these issues, this paper investigates the kinetic regularity of the information transfer in sequence data set. As a result, a new concept of Accumulatively Transferred Information (ATI) and its kinetic model in early predictable sequences are proposed. This model shows that the information transfer in early predictable sequences follows Inverse Heavy-tail Distribution(IHD), and the most uncertainty of an early predictable sequence is eliminated by only few of its preceding elements, which is exactly the intrinsic and theoretically sound ground of the feasibility of early prediction. Based on the kinetic model, a heuristic algorithm is proposed to learn the parameter k optimal . The experiments are conducted on real data sets and the results validate the reasonableness and effectiveness of the proposed theory and algorithm.
KeywordsEarly Prediction Temporal Sequence Information Transfer
Unable to display preview. Download preview PDF.
- 1.Dongand, G., Pei, J.: Sequence Data Mining. Springer, Heidelberg (2007)Google Scholar
- 2.Lesh, M.O.N., Zaki, M.J.: Mining features for sequence classification. In: Proc. of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 342–346. ACM, New York (1999)Google Scholar
- 3.Srikant, R.R.: Mining sequential patterns: Generalizations and performance improvements. In: Apers, P.M.G., Bouzeghoub, M., Gardarin, G. (eds.) EDBT 1996. LNCS, vol. 1057, pp. 3–17. Springer, Heidelberg (1996)Google Scholar
- 4.Pei, J., Han, J.: Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc. of the 17th International Conference on Data Engineering, pp. 215–226. IEEE, Los Alamitos (2001)Google Scholar
- 5.Ayres, T.J., Flannick, J.: Sequential pattern mining using a bitmap representation. In: Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 429–435. ACM, New York (2002)Google Scholar
- 8.Parker, P.C., Fern, A.: Gradient boosting for sequence alignment. In: Proc. of the 21st National Conference on Artificial Intelligence, pp. 452–457 (2006)Google Scholar
- 9.Karwath, N.A.: Boosting relational sequence alignments. In: Proc. of the 8th IEEE International Conference on Data Mining, pp. 857–862. IEEE, Los Alamitos (2008)Google Scholar
- 12.Wu, C., Berry, M.: Neural networks for full-scale protein sequence classification: Sequence encoding with singular value decomposition. Machine Learning 21(1), 177–193 (1995)Google Scholar
- 15.She, R., Chen, F.: Frequent-subsequence-based prediction of outer membrane proteins. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 436–445. ACM, New York (2003)Google Scholar
- 17.Alonso, C.J., Rodriguez, J.J.: Boosting interval based literals: Variable length and early classification. In: Data Mining in Time Series Databases. World Scientific, Singapore (2004)Google Scholar
- 22.Cohen, W.W., Singer, Y.: A simple, Fast, and Effective Learner. In: Proc. of the 16th National Conference on Artificial Intelligence, pp. 335–342 (1999)Google Scholar
- 23.Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)Google Scholar