Advertisement

Journal of Computer Science and Technology

, Volume 29, Issue 3, pp 345–360 | Cite as

Online Induction of Probabilistic Real-Time Automata

  • Jana SchmidtEmail author
  • Stefan Kramer
Regular Paper
  • 133 Downloads

Abstract

The probabilistic real-time automaton (PRTA) is a representation of dynamic processes arising in the sciences and industry. Currently, the induction of automata is divided into two steps: the creation of the prefix tree acceptor (PTA) and the merge procedure based on clustering of the states. These two steps can be very time intensive when a PRTA is to be induced for massive or even unbounded datasets. The latter one can be efficiently processed, as there exist scalable online clustering algorithms. However, the creation of the PTA still can be very time consuming. To overcome this problem, we propose a genuine online PRTA induction approach that incorporates new instances by first collapsing them and then using a maximum frequent pattern based clustering. The approach is tested against a predefined synthetic automaton and real world datasets, for which the approach is scalable and stable. Moreover, we present a broad evaluation on a real world disease group dataset that shows the applicability of such a model to the analysis of medical processes.

Keywords

probabilistic real-time automata online induction maximum frequent pattern based clustering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

11390_2014_1435_MOESM1_ESM.pdf (82 kb)
ESM 1 (PDF 81 kb)

References

  1. [1]
    Patnaik D, Butler P, Ramakrishnan N, Parida L, Keller B J, Hanauer D A. Experiences with mining temporal event sequences from electronic medical records: Initial successes and some challenges. In Proc. the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug. 2011, pp.360-368.Google Scholar
  2. [2]
    Verwer S, De Weerdt M, Witteveen C. A likelihood-ratio test for identifying probabilistic deterministic real-time automata from positive data. In Lecture Notes in Computer Science 6339, Sempere J M, Garcia P (eds.), 2010, pp.203-216.Google Scholar
  3. [3]
    Verwer S, De Weerdt M, Witteveen C. The efficiency of identifying timed automata and the power of clocks. Information and Computation, 2011, 209(3): 606-625.CrossRefzbMATHMathSciNetGoogle Scholar
  4. [4]
    Peng H K, Wu P, Zhu J, Zhang J Y. Helix: Unsupervised grammar induction for structured activity recognition. In Proc. the 11th IEEE International Conference on Data Mining, December 2011, pp.1194-1199.Google Scholar
  5. [5]
    Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publisher, March 2006.Google Scholar
  6. [6]
    Verwer S, de Weerdt M, Witteveen C. Efficiently identifying deterministic real-time automata from labeled data. Machine Learning, 2012, 86(3): 295–333.CrossRefzbMATHMathSciNetGoogle Scholar
  7. [7]
    Schmidt J, Ansorge S, Kramer S. Scalable induction of probabilistic real-time automata using maximum frequent pattern based clustering. In Proc. the 12th SIAM International Conference on Data Mining, April 2012, pp.272-283.Google Scholar
  8. [8]
    Džeroski S, Gjorgjioski V, Slavkov I, Struyf J. Analysis of time series data with predictive clustering trees. In Proc. the 5th International Conference on Knowledge Discovery in Inductive Databases, September 2006, pp.63-80.Google Scholar
  9. [9]
    Sese J, Kurokawa Y, Monden M, Kato K, Morishita S. Constrained clusters of gene expression profiles with pathological features. Bioinformatics, 2004, 20(17): 3137-3145.CrossRefGoogle Scholar
  10. [10]
    Blachon S, Pensa R, Besson J, Robardet C, Boulicaut J F, Gandrillon O. Clustering formal concepts to discover biologically relevant knowledge from gene expression data. In Silico Biology, 2007, 7(4/5): 467-483.Google Scholar
  11. [11]
    Cerf L, Besson J, Robardet C, Boulicaut J F. Closed patterns meet n-ary relations. ACM Transactions on Knowledge Discovery from Data, 2009, 3(1): Article No.3.Google Scholar
  12. [12]
    Achar A, Laxman S, Sastry P S. A unified view of the apriori-based algorithms for frequent episode discovery. Knowledge and Information Systems, 2012, 31(2): 223-250.CrossRefGoogle Scholar
  13. [13]
    Schmidt J, Kramer S. The augmented itemset tree: A data structure for online maximum frequent pattern mining. In Proc. the 14th International Conference on Discovery Science, October 2011, pp.277-291.Google Scholar
  14. [14]
    Wang C, Lai J, Zhu J. Conscience online learning: An efficient approach for robust kernel-based clustering. Knowledge and Information Systems, 2012, 31(1): 79–104.CrossRefGoogle Scholar
  15. [15]
    Masud M M, Al-Khateeb T, Khan L, Aggarwal C, Gao J, Han J, Thuraisingham B M. Detecting recurring and novel classes in concept-drifting data streams. In Proc. the 11th IEEE International Conference on Data Mining, Dec. 2011, pp.1176-1181.Google Scholar
  16. [16]
    Hommerson A, Verwer S, Lucas P. Discovering probabilistic structures of healthcare. In Lecture Notes in Computer Science 8268, Riaño D, Lenz R, Miksch S et al. (eds.), Springer-Verlag, 2013, pp.53-67.Google Scholar
  17. [17]
    Rowicka M, Kudlicki A, Tu B P, Otwinowski Z. High-resolution timing of cell cycle-regulated gene expression. Proceedings of the National Academy of Sciences of the United States of America, 2007, 104(43): 16892-16897.CrossRefGoogle Scholar
  18. [18]
    Hubert L, Arabie P. Comparing partitions. Journal of Classification, 1985, 1(2): 193-218.CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media New York & Science Press, China 2014

Authors and Affiliations

  1. 1.Technische Universität München, Institut für Informatik/I12Garching b. MünchenGermany
  2. 2.Johannes Gutenberg-Universität Mainz, Institute for Computer ScienceMainzGermany

Personalised recommendations