Efficiently identifying deterministic real-time automata from labeled data Authors Sicco Verwer Katholieke Universiteit Leuven Mathijs de Weerdt Delft University of Technology Cees Witteveen Delft University of Technology Article

First Online: 13 October 2011 Received: 22 January 2010 Accepted: 26 September 2011 DOI :
10.1007/s10994-011-5265-4

Cite this article as: Verwer, S., de Weerdt, M. & Witteveen, C. Mach Learn (2012) 86: 295. doi:10.1007/s10994-011-5265-4
Abstract We develop a novel learning algorithm RTI for identifying a deterministic real-time automaton (DRTA) from labeled time-stamped event sequences. The RTI algorithm is based on the current state of the art in deterministic finite-state automaton (DFA) identification, called evidence-driven state-merging (EDSM). In addition to having a DFA structure, a DRTA contains time constraints between occurrences of consecutive events. Although this seems a small difference, we show that the problem of identifying a DRTA is much more difficult than the problem of identifying a DFA: identifying only the time constraints of a DRTA given its DFA structure is already NP -complete. In spite of this additional complexity, we show that RTI is a correct and complete algorithm that converges efficiently (from polynomial time and data) to the correct DRTA in the limit. To the best of our knowledge, this is the first algorithm that can identify a timed automaton model from time-stamped event sequences.

A straightforward alternative to identifying DRTAs is to identify a DFA that models time implicitly , i.e., a DFA that uses different states for different points in time. Such a DFA can be identified by first sampling the timed sequences using a fixed frequency, and subsequently applying EDSM to the resulting non-timed event sequences. We evaluate the performance of both RTI and this sampling approach experimentally on artificially generated data. In these experiments RTI outperforms the sampling approach significantly. Thus, we show that if we obtain data from a real-time system, it is easier to identify a DRTA from this data than to identify an equivalent DFA.

Keywords Timed automata Real-time automata Identification in the limit Supervised learning Editor: Nicolo Cesa-Bianchi.

The main part of this research was performed when the first author was a PhD student at Delft University of Technology. It has been supported and funded by the Dutch Ministry of Economical Affairs under the SENTER program.

Download to read the full article text

References
Alur, R., & Dill, D. L. (1994). A theory of timed automata.

Theoretical Computer Science ,

126 , 183–235.

MathSciNet MATH CrossRef
Bishop, C. M. (2006).

Pattern recognition and machine learning . Berlin: Springer.

MATH
Bugalho, M., & Oliveira, A. L. (2005). Inference of regular languages using state merging algorithms with search.

Pattern Recognition ,

38 , 1457–1467.

MATH CrossRef
Carrasco, R., & Oncina, J. (1994). Learning stochastic regular grammars by means of a state merging method. In LNCS: Vol. 862 . Proceedings of the 2nd international colloqium on grammatical inference (pp. 139–150). Berlin: Springer.

Clark, A., & Thollard, F. (2004) PAC-learnability of probabilistic deterministic finite state automata. Journal of Machine Learning Research , 473–497.

Dima, C. (2001). Real-time automata.

Journal of Automata, Languages and Combinatorics ,

6 (1), 2–23.

MathSciNet
Dupont, P., Denis, F., & Esposito, Y. (2005). Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms.

Pattern Recognition ,

38 , 1349–1371.

MATH CrossRef
Gold, E. M. (1978). Complexity of automaton identification from given data.

Information and Control ,

37 (3), 302–320.

MathSciNet MATH CrossRef
Goldman, S. A., & Mathias, H. D. (1996). Teaching a smarter learner.

Journal of Computer and System Sciences ,

52 (2), 255–267.

MathSciNet CrossRef
Grinchtein, O., Jonsson, B., & Petterson, P. (2006). Inference of event-recording automata using timed decision trees. In LNCS: Vol. 4137 . CONCUR (pp. 435–449). Berlin: Springer.

Guédon, Y. (2003). Estimating hidden semi-Markov chains from discrete sequences.

Journal of Computational and Graphical Statistics ,

12 (3), 604–639.

MathSciNet CrossRef
de la Higuera, C. (1997). Characteristic sets for polynomial grammatical inference.

Machine Learning ,

27 (2), 125–138.

MATH CrossRef
de la Higuera, C. (2005). A bibliographical study of grammatical inference.

Pattern Recognition ,

38 (9), 1332–1348.

CrossRef
Kermorvant, C., & Dupont, P. (2002). Stochastic grammatical inference with multinomial tests. In LNAI: Vol. 2484 . Proceedings of the 6th international colloquium on grammatical inference (pp. 149–160). Berlin: Springer.

Lang, K. J., Pearlmutter, B. A., & Price, R. A. (1998). Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In

LNCS: Vol. 1433 .

Grammatical inference . Berlin: Springer.

CrossRef
Larsen, K. G., Petterson, P., & Yi, W. (1997). Uppaal in a nutshell.

International Journal on Software Tools for Technology Transfer ,

1 (1–2), 134–152.

MATH
Mitchell, T. (1997).

Machine learning . New York: McGraw-Hill.

MATH
Mörchen, F., & Ultsch, A. (2004). Mining temporal patterns in multivariate time series. In LNCS: Vol. 3238 . Advances in artificial intelligence (pp. 127–140). Berlin: Springer.

Oncina, J., & Garcia, P. (1992). Inferring regular languages in polynomial update time. In

Series in machine perception and artificial intelligence: Vol. 1 .

Pattern recognition and image analysis (pp. 49–61). Singapore: World Scientific.

CrossRef
Pitt, L., & Warmuth, M. (1989). The minimum consistent DFA problem cannot be approximated within and polynomial. In Annual ACM symposium on theory of computing (pp. 421–432). New York: ACM.

Pnueli, A., Asarin, E., Maler, O., & Sifakis, J. (1998). Controller synthesis for timed automata. In IFAC symposium on system structure and control (pp. 469–474). Amsterdam: Elsevier.

Roddick, J. F., & Spiliopoulou, M. (2002). A survey of temporal knowledge discovery paradigms and methods.

IEEE Transactions on Knowledge and Data Engineering ,

14 (4), 750–767.

CrossRef
Sen, K., Viswanathan, M., & Agha, G. (2004). Learning continuous time Markov chains from sample executions. In Proceedings of the quantitative evaluation of systems (pp. 146–155).

Sipser, M. (1997).

Introduction to the theory of computation . Boston: PWS Publishing.

MATH
Springintveld, J., Vaandrager, F. W., & D’Argenio, P. R. (2001). Testing timed automata.

Theoretical Computer Science ,

254 (1–2), 225–257.

MathSciNet MATH CrossRef
Sudkamp, T. A. (2006). Languages and machines: an introduction to the theory of computer science (3rd ed.). Reading: Addison-Wesley.

Verwer, S., de Weerdt, M., & Witteveen, C. (2008). Polynomial distinguishability of timed automata. In

LNCS: Vol. 5278 .

Grammatical inference: theory and applications (pp. 238–251). Berlin: Springer.

CrossRef
Verwer, S., de Weerdt, M., & Witteveen, C. (2009). One-clock deterministic timed automata are efficiently identifiable in the limit. In

LNCS: Vol. 5457 .

Language and automata theory and applications (pp. 740–751). Berlin: Springer.

CrossRef
Verwer, S., de Weerdt, M., & Witteveen, C. (2011). The efficiency of identifying timed automata and the power of clocks.

Information and Computation ,

209 (3), 606–625.

MathSciNet MATH CrossRef