Skip to main content
Log in

An efficient pattern mining approach for event detection in multivariate temporal data

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present recent temporal pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the minimal predictive recent temporal patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Sequential pattern mining is a special case of time-interval pattern mining, in which all intervals are simply time points with zero durations.

  2. If \(E.s = E.e\), state interval \(E\) corresponds to a time point.

  3. If two state intervals have the same start time, we sort them by their end time. If they also have the same end time, we sort them by lexical order of their variable names (as proposed by [21]).

  4. This section contains materials that have been published in [6].

  5. It is more efficient to mine patterns that cover more than \(n\) instances in one of the classes compared to mining patterns that cover more than \(n\) instances in the entire database (the former is always a subset of the latter).

  6. The observations of the clinical variables are irregular in time because they are measured asynchronously at different time moments.

  7. We apply statistical significance testing with k-fold cross-validation. In this setting, the testing sets are independent of each other, but the training sets are not. Even though this does not perfectly fit the iid assumption, the significance results are still of great help in comparing different learning methods [27].

  8. As discussed in Sect. 4.2, we mine frequent patterns for the positives and negatives separately using the local minimum supports.

  9. Most of the highest scores MPRTPs are predicting the RENAL category because it is the easiest prediction task. So to diversify the patterns, we show the top three predictive MPRTPs for RENAL and the top two MPRTPs for other categories.

    Table 4 Diabetes dataset: the top MPRTPs with their precision and recall

References

  1. Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs, and mathematical tables

  2. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases (VLDB)

  3. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE)

  4. Allen F (1984) Towards a general theory of action and time. Artif Intell 23:123–154

    Article  MATH  Google Scholar 

  5. Batal I, Cooper G, Hauskrecht M (2012) A Bayesian scoring technique for mining predictive and non-spurious rules. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)

  6. Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)

  7. Batal I, Hauskrecht M (2009) A supervised time series feature extraction technique using DCT and DWT. In: International conference on machine learning and applications (ICMLA)

  8. Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM)

  9. Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4). doi:10.1145/2508037.2508044

  10. Blasiak S, Rangwala H (2011) A hidden Markov model variant for sequence classification. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI)

  11. Chandola V, Eilertson E, Ertoz L, Simon G, Kumar V (2006) Data mining for cyber security. In: Data warehousing and data mining techniques for computer security. Springer, Berlin

  12. Cheng H, Yan X, Han J, wei Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the international conference on data engineering (ICDE)

  13. Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050

    Article  Google Scholar 

  14. Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2008) A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data Knowl Eng 66:467–487

    Article  Google Scholar 

  15. Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3)

  16. Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1):16–22

  17. Hauskrecht M, Batal I, Valko M, Visweswaram S, Cooper G, Clermont G (2012) Outlier detection for patient monitoring and alerting. J Biomed Inform 46(1):47–55

  18. Hauskrecht M, Valko M, Batal I, Clermont G, Visweswaram S, Cooper G (2010) Conditional outlier detection for clinical alerting. In Proceedings of the American Medical Informatics Association (AMIA)

  19. Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243

  20. Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)

  21. Höppner F (2003) Knowledge discovery from sequential data, PhD thesis. Technical University Braunschweig, Germany

  22. Kam P-S, Fu AW-C (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWaK)

  23. Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583

    Article  Google Scholar 

  24. Keogh E, Chu S, Hart D, Pazzani M (1993) Segmenting time series: a survey and novel approach. In: Data mining in time series databases. World Scientific, pp 1–22

  25. Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. PVLDB 3:385–396

    Google Scholar 

  26. Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the international conference on data mining (ICDM)

  27. Mitchell TM (1997) Machine learning. McGraw-Hill Inc., New York

    MATH  Google Scholar 

  28. Moerchen F (2006a) Algorithms for time series knowledge mining. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)

  29. Moerchen F (2006b) Time series knowledge mining, PhD thesis. Philipps-University Marburg

  30. Moskovitch R, Shahar Y (2009), Medical temporal-knowledge discovery via temporal abstraction. In: Proceedings of the American Medical Informatics Association (AMIA)

  31. Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the international conference on very large data bases (VLDB)

  32. Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In Proceedings of the international conference on data mining (ICDM)

  33. Patel D, Hsu W, Lee ML (2008a) Mining relationships among interval-based events for classification. In: Proceedings of the international conference on management of data (SIGMOD)

  34. Patel D, Hsu W, Lee ML (2008b) Mining relationships among interval-based events for classification, In: Proceedings of the international conference on management of data (SIGMOD)

  35. Pei J, Han J, Mortazavi-asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the international conference on data engineering (ICDE)

  36. Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28:133–160

    Article  Google Scholar 

  37. Pendelton R, Wheeler M, Rodgers G (2006) Argatroban dosing of patients with heparin induced thrombocytopenia and an elevated aPTT due to antiphospholipid antibody syndrome. Ann Pharmacother 40:972–976

    Article  Google Scholar 

  38. Ratanamahatana C, Keogh EJ (2005) Three myths about dynamic time warping data mining, In: Proceedings of the SIAM international conference on data mining (SDM)

  39. Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247

  40. Shahar Y (1997) A framework for knowledge-based temporal abstraction. Artif Intell 90:79–133

    Article  MATH  Google Scholar 

  41. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT)

  42. Srivastava A, Kundu A, Sural S, Majumdar AK (2008) Credit card fraud detection using hidden Markov model. IEEE Trans Dependable Secure Comput 5(1):37–48

  43. Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS)

  44. Warkentin T (2000) Heparin-induced thrombocytopenia: pathogenesis and management. Br J Haematol 121:535–555

    Article  Google Scholar 

  45. Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33

    Article  Google Scholar 

  46. Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539

    Article  Google Scholar 

  47. Winarko E, Roddick JF (2007) ARMADA—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63:76–90

    Article  Google Scholar 

  48. Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19:742–758

    Article  Google Scholar 

  49. Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-k patterns. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)

  50. Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM)

  51. Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the international workshop on multimedia databases

  52. Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390

    Article  Google Scholar 

  53. Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iyad Batal.

Appendix: The Bayesian score: mathematical derivation and computational complexity

Appendix: The Bayesian score: mathematical derivation and computational complexity

In Sect. 5.1, we briefly introduced the Bayesian score of RTP \(P\) for predicting class label \(y\) compared to a more general group \(G\): \(G_P\!\subset \!G\). In this “Appendix”, we derive the marginal likelihood for models \(M_e,\,M_h\) and \(M_l\), which are required for computing the Bayesian score (solving Eq. 1). “The closed-form solution of the marginal likelihood for model \(M_e\)” of appendix describes the closed-form solution for computing \(P(G|M_e)\), which is the marginal likelihood for model \(M_e\) (the probability of \(y\) is the same inside and outside \(G_P\)). “Deriving a closed-form solution of the marginal likelihood for model \(M_h\)” of appendix derives the closed form solution for computing \(P(G|M_h)\), which is the marginal likelihood for model \(M_h\) (the probability of \(y\) in \(G_P\) is higher than outside \(G_P\)). “Four equivalent solutions of the marginal likelihood for model \(M_h\)” of appendix shows the four equivalent formulas for computing \(P(G|M_h)\). “Deriving a closed-form solution of the marginal likelihood for model \(M_l\)” of appendix illustrates how to obtain the marginal likelihood for model \(M_l\) (the probability of \(y\) in \(G_P\) is lower than outside \(G_P\)) directly from the solution to \(P(G|M_h)\). Finally, “Computational complexity” of appendix analyzes the overall computational complexity for computing the Bayesian score.

1.1 The closed-form solution of the marginal likelihood for model \(M_e\)

Let us start by defining the marginal likelihood for model \(M_e\). This model assumes that all instances in \(G\) have the same probability of having label \(Y=y\). Let us denote this probability by \(\theta \). To represent our uncertainty about \(\theta \), we use a beta distribution with parameters \(\alpha \) and \(\beta \). Let \(N_{*1}\) be the number of instances in \(G\) with \(Y=y\) and let \(N_{*2}\) be the number of instances in \(G\) with \(Y\!\ne \!y\) (i.e., instances that do not have class label \(y\)). The marginal likelihood for model \(M_e\) is as follows:

$$\begin{aligned} Pr(G|M_e)=\int _{\theta =0}^1 {\theta }^{N_{*1}} \cdot {(1-\theta )}^{N_{*2}} \cdot beta({\theta }; {\alpha }, {\beta }) d{\theta } \end{aligned}$$

The above integral yields the following well-known closed-form solution [19]:

$$\begin{aligned} Pr(G|M_e)=\frac{\varGamma (\alpha +\beta )}{\varGamma (\alpha +N_{*1}+\beta +N_{*2})} \cdot \frac{\varGamma (\alpha +N_{*1})}{\varGamma (\alpha )} \cdot \frac{\varGamma (\beta +N_{*2})}{\varGamma (\beta )} \end{aligned}$$
(2)

where \(\varGamma \) is the gamma function.

1.2 Deriving a closed-form solution of the marginal likelihood for model \(M_h\)

Now let us now define the marginal likelihood for model \(M_h\). This model assumes that the probability of \(Y=y\) for the instances in \(G_P\), denoted by \(\theta _1\), is higher than the probability of \(Y=y\) for the instances of \(G\) that are outside \(G_P\) (G \(\setminus \) G \(_P\)), denoted by \(\theta _2\). To represent our uncertainty about \(\theta _1\), we use a beta distribution with parameters \(\alpha _1\) and \(\beta _1\). To represent our uncertainty about \(\theta _2\), we use a beta distribution with parameters \(\alpha _2\) and \(\beta _2\). Let \(N_{11}\) and \(N_{12}\) be the number of instances in \(G_P\) with \(Y=y\) and with \(Y\!\ne \!y\), respectively. Let \(N_{21}\) and \(N_{22}\) be the number of instances outside \(G_P\) with \(Y=y\) and with \(Y\!\ne \!y\), respectively (see Fig. 12).

Fig. 12
figure 12

A diagram illustrating model \(M_h\)

The marginal likelihood for model \(M_h\) (\(Pr(G|M_h)\)) is defined as follows:

$$\begin{aligned}&\frac{1}{k}\int _{\theta _1=0}^1 \int _{\theta _2=0}^{\theta _1} {\theta _1}^{N_{11}} \cdot {(1-\theta _1)}^{N_{12}} \cdot {\theta _2}^{N_{21}} \cdot {(1-\theta _2)}^{N_{22}} \cdot beta({\theta _1}; {\alpha _1}, {\beta _1})\nonumber \\&\qquad \cdot beta({\theta _2}; {\alpha _2}, {\beta _2}) d{\theta _2} d{\theta _1}\nonumber \\&\quad =\frac{1}{k}\underbrace{\int _{\theta _1=0}^1 {\theta _1}^{N_{11}} \cdot {(1-\theta _1)}^{N_{12}} \cdot beta({\theta _1}; {\alpha _1}, {\beta _1})}_{f_1}\nonumber \\&\quad \times \underbrace{\int _{\theta _2=0}^{\theta _1} {\theta _2}^{N_{21}} \cdot {(1-\theta _2)}^{N_{22}} \cdot beta({\theta _2}; {\alpha _2}, {\beta _2}) d{\theta _2}}_{f_2} d{\theta _1} \end{aligned}$$
(3)

where \(k\) is a normalization constant for the parameter prior. Note that we do not assume that the parameters are independent, but rather we constrain \(\theta _1\) to be higher than \(\theta _2\).

To solve Eq. 3, we first show how to solve the integral over \(\theta _2\) in closed form, which is denoted by \(f_2\) in Eq. 3. We then expand the function denoted by \(f_1\), multiply it by the solution to \(f_2\), and solve the integral over \(\theta _1\) in closed form to complete the integration.

We use the regularized incomplete beta function [1] to solve the integral given by \(f_2\), which is as follows:

$$\begin{aligned} \int _{\theta =0}^{x} {\theta }^{a-1} \cdot {(1\!-\!\theta )}^{b-1} d{\theta } = \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a\!+\!b)} \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a\!+\!b)}{\varGamma (j\!+\!1) \cdot \varGamma (a\!+\!b\!-\!j)} \cdot x^j \cdot (1\!-\!x)^{a+b-1-j} \end{aligned}$$
(4)

where \(a\) and \(b\) should be natural numbers.

Note that when \(x=1\) in Eq. 4, the solution to the integral in that equation is simply the following:

$$\begin{aligned} \int _{\theta =0}^{1} {\theta }^{a-1} \cdot {(1-\theta )}^{b-1} d{\theta } = \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)} \end{aligned}$$
(5)

We now solve the integral given by \(f_2\) in Eq. 3 as follows:

$$\begin{aligned} f_2&= \int _{\theta _2=0}^{\theta _1} {\theta _2}^{N_{21}} \cdot {(1-\theta _2)}^{N_{22}} \cdot beta(\theta _2; \alpha _2, \beta _2) d \theta _2\\&= \int _{\theta _2=0}^{\theta _1} {\theta _2}^{N_{21}} \cdot {(1-\theta _2)}^{N_{22}} \cdot \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \theta _2^{\alpha _2-1} \cdot (1-\theta _2)^{\beta _2-1} d \theta _2\\&= \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \int _{\theta _2=0}^{\theta _1} {\theta _2}^{N_{21}+\alpha _2-1} \cdot {(1-\theta _2)}^{N_{22}+\beta _2-1} d \theta _2\\&= \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \int _{\theta _2=0}^{\theta _1} {\theta _2}^{a-1} \cdot {(1-\theta _2)}^{b-1} d \theta _2 \end{aligned}$$

where \(a=N_{21}+\alpha _2\) and \(b=N_{22}+\beta _2\).

Using Eq. 4, we get the following:

$$\begin{aligned} f_2=\frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)} \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a+b)}{\varGamma (j+1) \cdot \varGamma (a+b-j)} \cdot \theta _1^j \cdot (1-\theta _1)^{a+b-1-j} \end{aligned}$$
(6)

We now turn to \(f_1\), which can be expanded as follows:

$$\begin{aligned} f_1&= \int _{\theta _1=0}^1 {\theta _1}^{N_{11}} \cdot {(1-\theta _1)}^{N_{12}} \cdot beta({\theta _1}; {\alpha _1}, {\beta _1})\nonumber \\&= \int _{\theta _1=0}^1 {\theta _1}^{N_{11}} \cdot {(1-\theta _1)}^{N_{12}} \cdot \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \theta _1^{\alpha _1-1} \cdot (1-\theta _1)^{\beta _1-1}\nonumber \\&= \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \int _{\theta _1=0}^1 {\theta _1}^{c-1} \cdot {(1-\theta _1)}^{d-1} \end{aligned}$$
(7)

where \(c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).

Now, we combine Eqs. 6 and 7 to solve Eq. 3:

$$\begin{aligned} Pr(G|M_h)&= \frac{1}{k} \cdot f_1 \cdot f_2 d \theta _1\\&=\frac{1}{k} \cdot \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \int _{\theta _1=0}^1 {\theta _1}^{c-1} \cdot {(1\!-\!\theta _1)}^{d-1} \cdot \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)}\\&\quad \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a+b)}{\varGamma (j+1) \cdot \varGamma (a+b-j)} \cdot \theta _1^j \cdot (1-\theta _1)^{a+b-1-j} d \theta _1\\&=\frac{1}{k} \cdot \frac{\varGamma (\alpha _1\!+\!\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \frac{\varGamma (\alpha _2\!+\!\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a\!+\!b)} \cdot \int _{\theta _1=0}^1 {\theta _1}^{c-1} \cdot {(1\!-\!\theta _1)}^{d-1}\\&\quad \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a+b)}{\varGamma (j+1)\cdot \varGamma (a+b-j)} \cdot \theta _1^j \cdot (1-\theta _1)^{a+b-1-j} d \theta _1\\&=\frac{1}{k} \cdot \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)}\\&\quad \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a\!+\!b)}{\varGamma (j\!+\!1)\cdot \varGamma (a\!+\!b\!-\!j)} \cdot \int _{\theta _1=0}^1 {\theta _1}^{(c+j)-1} \cdot {(1\!-\!\theta _1)}^{(a+b+d-1-j)-1} d \theta _1\\ \end{aligned}$$

which by Eq. 5 is

$$\begin{aligned} \quad Pr(G|M_h)&=\frac{1}{k} \cdot \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)}\nonumber \\&\quad \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a+b)}{\varGamma (j+1)\cdot \varGamma (a+b-j)} \cdot \frac{\varGamma (c+j) \cdot \varGamma (a+b+d-1-j)}{\varGamma (a+b+c+d-1)}\nonumber \\&=\frac{1}{k} \cdot \frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)}\nonumber \\&\quad \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (j+1)\cdot \varGamma (a+b-j)} \cdot \frac{\varGamma (c+j) \cdot \varGamma (a+b+d-1-j)}{\varGamma (a+b+c+d-1)} \quad \end{aligned}$$
(8)

where \(a=N_{21}+\alpha _2,\,b=N_{22}+\beta _2,\,c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).

We can solve for \(k\) (the normalization constant for the parameter prior) by solving Eq. 3 (without the \(k\) term) with \(N_{11}=N_{12}=N_{21}=N_{22}=0\). Doing so is equivalent to applying Eq. 8 (without the \(k\) term) with \(a=\alpha _2,\,b=\beta _2,\,c=\alpha _1\) and \(d=\beta _1\). Note that \(k=\frac{1}{2}\) if we use uniform priors on both parameters by setting \(\alpha _1=\beta _1=\alpha _2=\beta _2=1\).

1.3 Four equivalent solutions of the marginal likelihood for model \(M_h\)

In the previous section, we showed the full derivation of the closed-form solution of the marginal likelihood for model \(M_h\). It turned out that there are four equivalent solutions to Eq. 3. Let us use the notations introduced in the previous section: \(a=N_{21}+\alpha _2,\,b=N_{22}+\beta _2,\,c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\). Also, let us define \(C\) as follows:

$$\begin{aligned} C=\frac{1}{k} \cdot \frac{\varGamma ({\alpha _1}+{\beta _1})}{\varGamma ({\alpha _1}) \cdot \varGamma ({\beta _1})}\cdot \frac{\varGamma ({\alpha _2}+{\beta _2})}{\varGamma ({\alpha _2}) \cdot \varGamma ({\beta _2})} \end{aligned}$$
(9)

The marginal likelihood of \(M_h\) (Eq. 3) can be obtained by solving any of the following four equations:

$$\begin{aligned} C \cdot \sum _{j=a}^{a+b-1} \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (j+1) \cdot \varGamma (a+b-j)} \cdot \frac{\varGamma (c+j) \cdot \varGamma (a+b+d-j-1)}{\varGamma (a+b+c+d-1)} \end{aligned}$$
(10)

which is the solution we derived in the previous section.

$$\begin{aligned}&C \cdot \sum _{j=d}^{d+c-1} \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (j+1) \cdot \varGamma (c+d-j)} \cdot \frac{\varGamma (b+j) \cdot \varGamma (c+d+a-j-1)}{\varGamma (a+b+c+d-1)}\end{aligned}$$
(11)
$$\begin{aligned}&C \cdot \left( \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)} \cdot \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (c+d)} - \sum _{j=b}^{a+b-1} \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (j+1) \cdot \varGamma (a+b-j)} \right. \nonumber \\&\qquad \left. \cdot \frac{\varGamma (d+j) \cdot \varGamma (a+b+c-j-1)}{\varGamma (a+b+c+d-1)} \right) \end{aligned}$$
(12)
$$\begin{aligned}&C \cdot \left( \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)} \cdot \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (c+d)} - \sum _{j=c}^{c+d-1} \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (j+1) \cdot \varGamma (c+d-j)} \right. \nonumber \\&\qquad \left. \cdot \frac{\varGamma (a+j) \cdot \varGamma (c+d+b-j-1)}{\varGamma (a+b+c+d-1)} \right) \end{aligned}$$
(13)

1.4 Deriving a closed-form solution of the marginal likelihood for model \(M_l\)

Lastly, let us define the marginal likelihood for model \(M_l\), which assumes that the probability of \(Y=y\) for the instances in \(G_P\) (\(\theta _1\)) is lower than the probability of \(Y=y\) for the instances of \(G\) that are outside \(G_P\) (\(\theta _2\)). The marginal likelihood for \(M_l\) is similar to Eq. 3, but integrates \(\theta _2\) from 0 to 1 and constrains \(\theta _1\) to be integrated from 0 to \(\theta _2\) (forcing \(\theta _1\) to be smaller than \(\theta _2\)) as follows:

$$\begin{aligned}&\frac{1}{k}\underbrace{\int _{\theta _2=0}^1 {\theta _2}^{N_{21}} \cdot {(1-\theta _2)}^{N_{22}} \cdot beta({\theta _2}; {\alpha _2}, {\beta _2})}_{f_1} \nonumber \\&\quad \times \underbrace{\int _{\theta _1=0}^{\theta _2} {\theta _1}^{N_{11}} \cdot {(1-\theta _1)}^{N_{11}} \cdot beta({\theta _1}; {\alpha _1}, {\beta _1}) d{\theta _1}}_{f_2} d{\theta _2} \end{aligned}$$
(14)

By solving the integral given by \(f_2\), we get:

$$\begin{aligned} f_2&=\frac{\varGamma (\alpha _1+\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \int _{\theta _1=0}^{\theta _2} {\theta _1}^{c-1} \cdot {(1-\theta _1)}^{d-1} d \theta _2\\&=\frac{\varGamma (\alpha _1\!+\!\beta _1)}{\varGamma (\alpha _1) \cdot \varGamma (\beta _1)} \cdot \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (c\!+\!d)} \cdot \sum _{j=c}^{c+d-1} \frac{\varGamma (c\!+\!d)}{\varGamma (j\!+\!1) \cdot \varGamma (c\!+\!d\!-\!j)} \cdot \theta _2^j \cdot (1-\theta _2)^{c+d-1-j} \end{aligned}$$

where, as before, \(c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).

By solving \(f_1\), we get:

$$\begin{aligned} f_1 =\frac{\varGamma (\alpha _2+\beta _2)}{\varGamma (\alpha _2) \cdot \varGamma (\beta _2)} \int _{\theta _2=0}^1 {\theta _2}^{a-1} \cdot {(1-\theta _2)}^{b-1} \end{aligned}$$

where, as before, \(a=N_{21}+\alpha _2\) and \(b=N_{22}+\beta _2\).

Now, we can solve Eq. 14:

$$\begin{aligned} \quad Pr(G|M_l) = C \cdot \sum _{j=c}^{c+d-1} \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (j+1)\cdot \varGamma (c+d-j)} \cdot \frac{\varGamma (a+j) \cdot \varGamma (c+d+b-1-j)}{\varGamma (a+b+c+d-1)} \end{aligned}$$
(15)

where \(C\) is the same constant we defined by Eq. 9 in the previous section.

Notice that Eq. 15 [the solution to \(Pr(G|M_l)\)] can be obtained from Eq. 13 [one of the four solutions to \(Pr(G|M_h)\)] as follows:

$$\begin{aligned} Pr(G|M_l)=C \cdot \frac{\varGamma (a) \cdot \varGamma (b)}{\varGamma (a+b)} \cdot \frac{\varGamma (c) \cdot \varGamma (d)}{\varGamma (c+d)} - Pr(G|M_h) \end{aligned}$$
(16)

It turns out that no matter which formula we use to solve \(Pr(G|M_h)\), we can use Eq. 16 to obtain \(Pr(G|M_l)\).

1.5 Computational complexity

Since we require that \(N_{11},\,N_{12},\,N_{21},\,N_{22},\,\alpha _1,\,\beta _1,\,\alpha _2\) and \(\beta _2\) be natural numbers, the gamma function simply becomes a factorial function: \(\varGamma (x)=(x-1)!\). Since such numbers can become very large, it is convenient to use the logarithm of the gamma function and express Eqs. 210111213 and 16 in logarithmic form in order to preserve numerical precision. The logarithm of the integer gamma function can be pre-computed and efficiently stored in an array as follows:

$$\begin{aligned}&lnGamma[1]=0\\&\quad \hbox {For} \,\,i=2 \,\,\hbox {to}\,\, n\\&\qquad \qquad lnGamma[i]=lnGamma[i-1] + ln(i-1)\\ \end{aligned}$$

We then can use \(lnGamma\) in solving the above equations. However, Eqs. 101112 and 13 include a sum, which makes the use of the logarithmic form more involved. To deal with this issue, we can define function \(lnAdd\), which takes two arguments \(x\) and \(y\) that are in logarithmic form and returns \(ln(e^x + e^y)\). It does so in a way that preserves a good deal of numerical precision that could be lost if \(ln(e^x + e^y)\) were calculated in a direct manner. This is done by using the following formula:

$$\begin{aligned} lnAdd(x,y) = x + ln(1+e^{(y-x)}) \end{aligned}$$

Now that we introduced functions \(lnGamma\) and \(lnAdd\), it is straightforward to evaluate Eqs. 210111213 and 16 in logarithmic form.

Let us now analyze the overall computational complexity for computing the Bayesian score for a specific rule (solving Eq. 1). Doing so requires computing \(Pr(M_e|G),\,Pr(M_h|G)\) and \(Pr(M_l|G)\). \(Pr(M_e|G)\) can be computed in \(O(1)\) using Eq. 2. \(Pr(M_h|G)\) can be computed by applying Eqs. 101112 or 13. The computational complexity of these equations are \(O(N_{22}+\beta _2),\,O(N_{11}+\alpha _1),\,O(N_{21}+\alpha _2)\) and \(O(N_{12}+\beta _1)\), respectively. Therefore, \(Pr(M_h|G)\) can be computed in \(O(min(N_{11}+\alpha _1,N_{12}+\beta _1,N_{21}+\alpha _2,N_{22}+\beta _2))\). \(Pr(M_l|G)\) can be computed from \(Pr(M_h|G)\) in \(O(1)\) using Eq. 16. By assuming that \(\alpha _1,\,\beta _1,\,\alpha _2,\,\beta _2\) bounded from above, the overall complexity for computing the Bayesian score is \(\varvec{O(min(N_{11},N_{12},N_{21},N_{22}}))\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Batal, I., Cooper, G.F., Fradkin, D. et al. An efficient pattern mining approach for event detection in multivariate temporal data. Knowl Inf Syst 46, 115–150 (2016). https://doi.org/10.1007/s10115-015-0819-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-015-0819-6

Keywords

Navigation