Sequence Discrimination Using Phase-Type Distributions

  • Jérôme Callut
  • Pierre Dupont
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


We propose in this paper a novel approach to the classification of discrete sequences. This approach builds a model fitting some dynamical features deduced from the learning sample. These features are discrete phase-type (PH) distributions. They model the first passage times (FPT) between occurrences of pairs of substrings. The PHit algorithm, an adapted version of the Expectation-Maximization algorithm, is proposed to estimate PH distributions. The most informative pairs of substrings are selected according to the Jensen-Shannon divergence between their class conditional empirical FPT distributions. The selected features are then used in two classification schemes: a maximum a posteriori (MAP) classifier and support vector machines (SVM) with marginalized kernels. Experiments on DNA splicing region detection and on protein sublocalization illustrate that the proposed techniques offer competitive results with smoothed Markov chains or SVM with a spectrum string kernel.


Supervised sequence classification Markov chains First passage times Expectation-Maximization Jensen-Shannon divergence 


  1. 1.
    Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the em algorithm. Scandinavian Journal of Statistics 23(4), 419–441 (1996)zbMATHGoogle Scholar
  2. 2.
    Bobbio, A., Horváth, A., Scarpa, M., Telek, M.: Acyclic discrete phase type distributions: properties and a parameter estimation algorithm. Perform. Eval. 54(1), 1–32 (2003)CrossRefGoogle Scholar
  3. 3.
    Callut, J., Dupont, P.: Inducing hidden markov models to model long-term dependencies. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 513–521. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  4. 4.
    Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Ser. B (methodological) 39, 1–38 (1977)zbMATHMathSciNetGoogle Scholar
  5. 5.
    Dupont, P.: Noisy sequence classification with smoothed markov chains. In: Conférence francophone sur l’apprentissage automatique 2006 (CAP 2006), Trégastel, France, pp. 187–201 (2006)Google Scholar
  6. 6.
    Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Heidelberg (1983)zbMATHGoogle Scholar
  7. 7.
    Latouche, G., Ramaswami, V.: Introduction to Matrix Analytic Methods in Stochastic Modeling. Society for Industrial & Applied Mathematics, U.S. (1999)Google Scholar
  8. 8.
    Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Information Theory 37, 145–151 (1991)zbMATHCrossRefGoogle Scholar
  9. 9.
    Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  10. 10.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jérôme Callut
    • 1
    • 2
  • Pierre Dupont
    • 1
    • 2
  1. 1.Department of Computing Science and Engineering, INGIUniversité catholique de LouvainLouvain-la-NeuveBelgium
  2. 2.UCL Machine Learning Group 

Personalised recommendations