Online Induction of Probabilistic Real-Time Automata
- 133 Downloads
The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.
Keywordsprobabilistic real-time automata online induction maximum frequent pattern based clustering
Unable to display preview. Download preview PDF.
- Patnaik D, Butler P, Ramakrishnan N, Parida L, Keller B J, Hanauer D A. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2011, pp.360-368.Google Scholar
- Verwer S, De Weerdt M, Witteveen C. A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data. In Lecture Notes in Computer Science 6339, Sempere J M, Garcia P (eds.), 2010, pp.203-216.Google Scholar
- Peng H K, Wu P, Zhu J, Zhang J Y. Helix: Unsupervised grammar induction for structured activity recognition. In Proc. the 11th IEEE International Conference on Data Mining, December 2011, pp.1194-1199.Google Scholar
- Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publisher, March 2006.Google Scholar
- Schmidt J, Ansorge S, Kramer S. Scalable induction of probabilistic real-time automata using maximum frequent pattern based clustering. In Proc. the 12th SIAM International Conference on Data Mining, April 2012, pp.272-283.Google Scholar
- Džeroski S, Gjorgjioski V, Slavkov I, Struyf J. Analysis of time series data with predictive clustering trees. In Proc. the 5th International Conference on Knowledge Discovery in Inductive Databases, September 2006, pp.63-80.Google Scholar
- Blachon S, Pensa R, Besson J, Robardet C, Boulicaut J F, Gandrillon O. Clustering formal concepts to discover biologically relevant knowledge from gene expression data. In Silico Biology, 2007, 7(4/5): 467-483.Google Scholar
- Cerf L, Besson J, Robardet C, Boulicaut J F. Closed patterns meet n-ary relations. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): Article No.3.Google Scholar
- Schmidt J, Kramer S. The augmented itemset tree: A data structure for online maximum frequent pattern mining. In Proc. the 14th International Conference on Discovery Science, October 2011, pp.277-291.Google Scholar
- Masud M M, Al-Khateeb T, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B M. Detecting recurring and novel classes in concept-drifting data streams. In Proc. the 11th IEEE International Conference on Data Mining, Dec. 2011, pp.1176-1181.Google Scholar
- Hommerson A, Verwer S, Lucas P. Discovering probabilistic structures of healthcare. In Lecture Notes in Computer Science 8268, Riaño D, Lenz R, Miksch S et al. (eds.), Springer-Verlag, 2013, pp.53-67.Google Scholar