Distributions of pattern statistics in sparse Markov models
- 10 Downloads
Markov models provide a good approximation to probabilities associated with many categorical time series, and thus they are applied extensively. However, a major drawback associated with them is that the number of model parameters grows exponentially in the order of the model, and thus only very low-order models are considered in applications. Another drawback is lack of flexibility, in that Markov models give relatively few choices for the number of model parameters. Sparse Markov models are Markov models with conditioning histories that are grouped into classes such that the conditional probability distribution for members of each class is constant. The model gives a better handling of the trade-off between bias associated with having too few model parameters and variance from having too many. In this paper, methodology for efficient computation of pattern distributions through Markov chains with minimal state spaces is extended to the sparse Markov framework.
KeywordsAuxiliary Markov chain Pattern distribution Sparse Markov model Variable length Markov chain
This material is based upon work supported by the National Science Foundation under Grant No. 1811933. The author would like to thank the reviewer for their insightful comments on the original version of the manuscript.
- Benson, G., Mak, D. Y. F. (2009). Exact distribution of a spaced seed statistic for DNA homology detection. String processing and information retrieval, Lecture Notes in Computer Science, Vol. 5280, pp. 283–293. Berlin: Springer.Google Scholar
- Bercovici, S., Rodriguez, J. M., Elmore, M., Batzoglou, S. (2012). Ancestry inference in complex admixtures via variable-length Markov chain linkage models. Research in computational molecular biology, RECOMB 2012, Lecture Notes in Computer Science, Vol. 7262, pp. 12–28. Berlin: Springer.Google Scholar
- Borges, J., Levene, M. (2007). Evaluating variable length Markov chain models for analysis of user web navigation. IEEE Transactions on Knowledge, 19(4), 441–452.Google Scholar
- Browning, S. R. (2006). Multilocus association mapping using variable-length Markov chains. American Journal of Human Genetics, 78, 903–913.Google Scholar
- Gabadinho, A., Ritschard, G. (2016). Analyzing state sequences with probabilistic suffix trees. Journal of Statistical Software, 72(3), 1–39.Google Scholar
- García, J. E., González-López, V. A. (2010). Minimal Markov models. arXiv:1002.0729.
- Hopcroft, J. E. (1971). An \(n\) log \(n\) algorithm for minimizing states in a finite automaton. In Z. Kohavi & A. Paz (Eds.), Theory of Machines and Computation, pp. 189–196. New York: Academic Press.Google Scholar
- Lladser, M. E. (2007). Minimal Markov chain embeddings of pattern problems. In Proceedings of the 2007 information theory and applications workshop, University of California, San Diego.Google Scholar
- Ma, B., Tromp, J., Li, M. (2002). PatternHunter: Faster and more sensitive homology search. Bioinformatics, 18(3), 440–445.Google Scholar
- Mak, D. Y. F., Benson, G. (2009). All hits all the time: Parameter-free calculation of spaced seed sensitivity. Bioinformatics, 25(3), 302–308.Google Scholar
- Marshall, T., Rahmann, S. (2008). Probabilistic arithmetic automata and their application to pattern matching statistics. In: Ferragina, P., Landau, G.M. (eds), Proceedings of the 19th annual symposium on combinatorial pattern matching (CPM), Lecture Notes in Computer Science, Vol. 5029, pp. 95–106. Heidelberg: Springer.Google Scholar
- Martin, D. E. K. (2018). Minimal auxiliary Markov chains through sequential elimination of states. Communications in Statistics-Simulation and Computation. https://doi.org/10.1080/03610918.2017.1406505.
- Noé, L., Martin, D. E. K. (2014). A coverage criterion for spaced seeds and its applications to SVM string-kernels and \(k\)-mer distances. Journal of Computational Biology, 21(12), 947–963.Google Scholar
- Ribeca, P., Raineri, E. (2008). Faster exact Markovian probability functions for motif occurrences: A DFA-only approach. Bioinformatics, 24(24), 2839–2848.Google Scholar
- Roos, T., Yu, B. (2009). Sparse Markov source estimation via transformed Lasso. In Proceedings of the IEEE Information Theory Workshop (ITW-2009), pp. 241–245. Taormina, Sicily, Italy.Google Scholar