Abstract
This work proposes a pattern mining approach to learn event detection models from complex multivariate temporal data, such as electronic health records. We present recent temporal pattern mining, a novel approach for efficiently finding predictive patterns for event detection problems. This approach first converts the time series data into time-interval sequences of temporal abstractions. It then constructs more complex time-interval patterns backward in time using temporal operators. We also present the minimal predictive recent temporal patterns framework for selecting a small set of predictive and non-spurious patterns. We apply our methods for predicting adverse medical events in real-world clinical data. The results demonstrate the benefits of our methods in learning accurate event detection models, which is a key step for developing intelligent patient monitoring and decision support systems.
Similar content being viewed by others
Notes
Sequential pattern mining is a special case of time-interval pattern mining, in which all intervals are simply time points with zero durations.
If \(E.s = E.e\), state interval \(E\) corresponds to a time point.
If two state intervals have the same start time, we sort them by their end time. If they also have the same end time, we sort them by lexical order of their variable names (as proposed by [21]).
This section contains materials that have been published in [6].
It is more efficient to mine patterns that cover more than \(n\) instances in one of the classes compared to mining patterns that cover more than \(n\) instances in the entire database (the former is always a subset of the latter).
The observations of the clinical variables are irregular in time because they are measured asynchronously at different time moments.
We apply statistical significance testing with k-fold cross-validation. In this setting, the testing sets are independent of each other, but the training sets are not. Even though this does not perfectly fit the iid assumption, the significance results are still of great help in comparing different learning methods [27].
As discussed in Sect. 4.2, we mine frequent patterns for the positives and negatives separately using the local minimum supports.
Most of the highest scores MPRTPs are predicting the RENAL category because it is the easiest prediction task. So to diversify the patterns, we show the top three predictive MPRTPs for RENAL and the top two MPRTPs for other categories.
References
Abramowitz M, Stegun IA (1964) Handbook of mathematical functions with formulas, graphs, and mathematical tables
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the international conference on very large data bases (VLDB)
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the international conference on data engineering (ICDE)
Allen F (1984) Towards a general theory of action and time. Artif Intell 23:123–154
Batal I, Cooper G, Hauskrecht M (2012) A Bayesian scoring technique for mining predictive and non-spurious rules. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)
Batal I, Fradkin D, Harrison J, Moerchen F, Hauskrecht M (2012) Mining recent temporal patterns for event detection in multivariate time series data. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
Batal I, Hauskrecht M (2009) A supervised time series feature extraction technique using DCT and DWT. In: International conference on machine learning and applications (ICMLA)
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM)
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2013) A temporal pattern mining approach for classifying electronic health record data. ACM Trans Intell Syst Technol 4(4). doi:10.1145/2508037.2508044
Blasiak S, Rangwala H (2011) A hidden Markov model variant for sequence classification. In: Proceedings of the international joint conferences on artificial intelligence (IJCAI)
Chandola V, Eilertson E, Ertoz L, Simon G, Kumar V (2006) Data mining for cyber security. In: Data warehousing and data mining techniques for computer security. Springer, Berlin
Cheng H, Yan X, Han J, wei Hsu C (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of the international conference on data engineering (ICDE)
Deshpande M, Kuramochi M, Wale N, Karypis G (2005) Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans Knowl Data Eng 17:1036–1050
Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2008) A two-stage methodology for sequence classification based on sequential pattern mining and optimization. Data Knowl Eng 66:467–487
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3)
Guttormsson SE, Marks RJ, El-Sharkawi MA, Kerszenbaum I (1999) Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Trans Energy Convers 14(1):16–22
Hauskrecht M, Batal I, Valko M, Visweswaram S, Cooper G, Clermont G (2012) Outlier detection for patient monitoring and alerting. J Biomed Inform 46(1):47–55
Hauskrecht M, Valko M, Batal I, Clermont G, Visweswaram S, Cooper G (2010) Conditional outlier detection for clinical alerting. In Proceedings of the American Medical Informatics Association (AMIA)
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243
Höppner F (2001) Discovery of temporal patterns. Learning rules about the qualitative behaviour of time series. In: Proceedings of the European conference on principles of data mining and knowledge discovery (PKDD)
Höppner F (2003) Knowledge discovery from sequential data, PhD thesis. Technical University Braunschweig, Germany
Kam P-S, Fu AW-C (2000) Discovering temporal patterns for interval-based events. In: Proceedings of the international conference on data warehousing and knowledge discovery (DaWaK)
Kavsek B, Lavrač N (2006) APRIORI-SD: adapting association rule learning to subgroup discovery. Appl Artif Intell 20(7):543–583
Keogh E, Chu S, Hart D, Pazzani M (1993) Segmenting time series: a survey and novel approach. In: Data mining in time series databases. World Scientific, pp 1–22
Li L, Prakash BA, Faloutsos C (2010) Parsimonious linear fingerprinting for time series. PVLDB 3:385–396
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the international conference on data mining (ICDM)
Mitchell TM (1997) Machine learning. McGraw-Hill Inc., New York
Moerchen F (2006a) Algorithms for time series knowledge mining. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
Moerchen F (2006b) Time series knowledge mining, PhD thesis. Philipps-University Marburg
Moskovitch R, Shahar Y (2009), Medical temporal-knowledge discovery via temporal abstraction. In: Proceedings of the American Medical Informatics Association (AMIA)
Papadimitriou S, Sun J, Faloutsos C (2005) Streaming pattern discovery in multiple time-series. In: Proceedings of the international conference on very large data bases (VLDB)
Papapetrou P, Kollios G, Sclaroff S, Gunopulos D (2005) Discovering frequent arrangements of temporal intervals. In Proceedings of the international conference on data mining (ICDM)
Patel D, Hsu W, Lee ML (2008a) Mining relationships among interval-based events for classification. In: Proceedings of the international conference on management of data (SIGMOD)
Patel D, Hsu W, Lee ML (2008b) Mining relationships among interval-based events for classification, In: Proceedings of the international conference on management of data (SIGMOD)
Pei J, Han J, Mortazavi-asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the international conference on data engineering (ICDE)
Pei J, Han J, Wang W (2007) Constraint-based sequential pattern mining: the pattern-growth methods. J Intell Inf Syst 28:133–160
Pendelton R, Wheeler M, Rodgers G (2006) Argatroban dosing of patients with heparin induced thrombocytopenia and an elevated aPTT due to antiphospholipid antibody syndrome. Ann Pharmacother 40:972–976
Ratanamahatana C, Keogh EJ (2005) Three myths about dynamic time warping data mining, In: Proceedings of the SIAM international conference on data mining (SDM)
Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Discov 15(2):217–247
Shahar Y (1997) A framework for knowledge-based temporal abstraction. Artif Intell 90:79–133
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the international conference on extending database technology (EDBT)
Srivastava A, Kundu A, Sural S, Majumdar AK (2008) Credit card fraud detection using hidden Markov model. IEEE Trans Dependable Secure Comput 5(1):37–48
Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of the international joint conference on autonomous agents and multiagent systems (AAMAS)
Warkentin T (2000) Heparin-induced thrombocytopenia: pathogenesis and management. Br J Haematol 121:535–555
Webb GI (2007) Discovering significant patterns. Mach Learn 68(1):1–33
Weng X, Shen J (2008) Classification of multivariate time series using two-dimensional singular value decomposition. Knowl Based Syst 21(7):535–539
Winarko E, Roddick JF (2007) ARMADA—an algorithm for discovering richer relative temporal association rules from interval-based data. Data Knowl Eng 63:76–90
Wu S-Y, Chen Y-L (2007) Mining nonambiguous temporal patterns for interval-based events. IEEE Trans Knowl Data Eng 19:742–758
Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-k patterns. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD)
Yan X, Han J, Afshar R (2003) CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of the SIAM international conference on data mining (SDM)
Yang K, Shahabi C (2004) A PCA-based similarity measure for multivariate time series. In: Proceedings of the international workshop on multimedia databases
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12:372–390
Zaki MJ (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Author information
Authors and Affiliations
Corresponding author
Appendix: The Bayesian score: mathematical derivation and computational complexity
Appendix: The Bayesian score: mathematical derivation and computational complexity
In Sect. 5.1, we briefly introduced the Bayesian score of RTP \(P\) for predicting class label \(y\) compared to a more general group \(G\): \(G_P\!\subset \!G\). In this “Appendix”, we derive the marginal likelihood for models \(M_e,\,M_h\) and \(M_l\), which are required for computing the Bayesian score (solving Eq. 1). “The closed-form solution of the marginal likelihood for model \(M_e\)” of appendix describes the closed-form solution for computing \(P(G|M_e)\), which is the marginal likelihood for model \(M_e\) (the probability of \(y\) is the same inside and outside \(G_P\)). “Deriving a closed-form solution of the marginal likelihood for model \(M_h\)” of appendix derives the closed form solution for computing \(P(G|M_h)\), which is the marginal likelihood for model \(M_h\) (the probability of \(y\) in \(G_P\) is higher than outside \(G_P\)). “Four equivalent solutions of the marginal likelihood for model \(M_h\)” of appendix shows the four equivalent formulas for computing \(P(G|M_h)\). “Deriving a closed-form solution of the marginal likelihood for model \(M_l\)” of appendix illustrates how to obtain the marginal likelihood for model \(M_l\) (the probability of \(y\) in \(G_P\) is lower than outside \(G_P\)) directly from the solution to \(P(G|M_h)\). Finally, “Computational complexity” of appendix analyzes the overall computational complexity for computing the Bayesian score.
1.1 The closed-form solution of the marginal likelihood for model \(M_e\)
Let us start by defining the marginal likelihood for model \(M_e\). This model assumes that all instances in \(G\) have the same probability of having label \(Y=y\). Let us denote this probability by \(\theta \). To represent our uncertainty about \(\theta \), we use a beta distribution with parameters \(\alpha \) and \(\beta \). Let \(N_{*1}\) be the number of instances in \(G\) with \(Y=y\) and let \(N_{*2}\) be the number of instances in \(G\) with \(Y\!\ne \!y\) (i.e., instances that do not have class label \(y\)). The marginal likelihood for model \(M_e\) is as follows:
The above integral yields the following well-known closed-form solution [19]:
where \(\varGamma \) is the gamma function.
1.2 Deriving a closed-form solution of the marginal likelihood for model \(M_h\)
Now let us now define the marginal likelihood for model \(M_h\). This model assumes that the probability of \(Y=y\) for the instances in \(G_P\), denoted by \(\theta _1\), is higher than the probability of \(Y=y\) for the instances of \(G\) that are outside \(G_P\) (G \(\setminus \) G \(_P\)), denoted by \(\theta _2\). To represent our uncertainty about \(\theta _1\), we use a beta distribution with parameters \(\alpha _1\) and \(\beta _1\). To represent our uncertainty about \(\theta _2\), we use a beta distribution with parameters \(\alpha _2\) and \(\beta _2\). Let \(N_{11}\) and \(N_{12}\) be the number of instances in \(G_P\) with \(Y=y\) and with \(Y\!\ne \!y\), respectively. Let \(N_{21}\) and \(N_{22}\) be the number of instances outside \(G_P\) with \(Y=y\) and with \(Y\!\ne \!y\), respectively (see Fig. 12).
The marginal likelihood for model \(M_h\) (\(Pr(G|M_h)\)) is defined as follows:
where \(k\) is a normalization constant for the parameter prior. Note that we do not assume that the parameters are independent, but rather we constrain \(\theta _1\) to be higher than \(\theta _2\).
To solve Eq. 3, we first show how to solve the integral over \(\theta _2\) in closed form, which is denoted by \(f_2\) in Eq. 3. We then expand the function denoted by \(f_1\), multiply it by the solution to \(f_2\), and solve the integral over \(\theta _1\) in closed form to complete the integration.
We use the regularized incomplete beta function [1] to solve the integral given by \(f_2\), which is as follows:
where \(a\) and \(b\) should be natural numbers.
Note that when \(x=1\) in Eq. 4, the solution to the integral in that equation is simply the following:
We now solve the integral given by \(f_2\) in Eq. 3 as follows:
where \(a=N_{21}+\alpha _2\) and \(b=N_{22}+\beta _2\).
Using Eq. 4, we get the following:
We now turn to \(f_1\), which can be expanded as follows:
where \(c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).
Now, we combine Eqs. 6 and 7 to solve Eq. 3:
which by Eq. 5 is
where \(a=N_{21}+\alpha _2,\,b=N_{22}+\beta _2,\,c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).
We can solve for \(k\) (the normalization constant for the parameter prior) by solving Eq. 3 (without the \(k\) term) with \(N_{11}=N_{12}=N_{21}=N_{22}=0\). Doing so is equivalent to applying Eq. 8 (without the \(k\) term) with \(a=\alpha _2,\,b=\beta _2,\,c=\alpha _1\) and \(d=\beta _1\). Note that \(k=\frac{1}{2}\) if we use uniform priors on both parameters by setting \(\alpha _1=\beta _1=\alpha _2=\beta _2=1\).
1.3 Four equivalent solutions of the marginal likelihood for model \(M_h\)
In the previous section, we showed the full derivation of the closed-form solution of the marginal likelihood for model \(M_h\). It turned out that there are four equivalent solutions to Eq. 3. Let us use the notations introduced in the previous section: \(a=N_{21}+\alpha _2,\,b=N_{22}+\beta _2,\,c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\). Also, let us define \(C\) as follows:
The marginal likelihood of \(M_h\) (Eq. 3) can be obtained by solving any of the following four equations:
which is the solution we derived in the previous section.
1.4 Deriving a closed-form solution of the marginal likelihood for model \(M_l\)
Lastly, let us define the marginal likelihood for model \(M_l\), which assumes that the probability of \(Y=y\) for the instances in \(G_P\) (\(\theta _1\)) is lower than the probability of \(Y=y\) for the instances of \(G\) that are outside \(G_P\) (\(\theta _2\)). The marginal likelihood for \(M_l\) is similar to Eq. 3, but integrates \(\theta _2\) from 0 to 1 and constrains \(\theta _1\) to be integrated from 0 to \(\theta _2\) (forcing \(\theta _1\) to be smaller than \(\theta _2\)) as follows:
By solving the integral given by \(f_2\), we get:
where, as before, \(c=N_{11}+\alpha _1\) and \(d=N_{12}+\beta _1\).
By solving \(f_1\), we get:
where, as before, \(a=N_{21}+\alpha _2\) and \(b=N_{22}+\beta _2\).
Now, we can solve Eq. 14:
where \(C\) is the same constant we defined by Eq. 9 in the previous section.
Notice that Eq. 15 [the solution to \(Pr(G|M_l)\)] can be obtained from Eq. 13 [one of the four solutions to \(Pr(G|M_h)\)] as follows:
It turns out that no matter which formula we use to solve \(Pr(G|M_h)\), we can use Eq. 16 to obtain \(Pr(G|M_l)\).
1.5 Computational complexity
Since we require that \(N_{11},\,N_{12},\,N_{21},\,N_{22},\,\alpha _1,\,\beta _1,\,\alpha _2\) and \(\beta _2\) be natural numbers, the gamma function simply becomes a factorial function: \(\varGamma (x)=(x-1)!\). Since such numbers can become very large, it is convenient to use the logarithm of the gamma function and express Eqs. 2, 10, 11, 12, 13 and 16 in logarithmic form in order to preserve numerical precision. The logarithm of the integer gamma function can be pre-computed and efficiently stored in an array as follows:
We then can use \(lnGamma\) in solving the above equations. However, Eqs. 10, 11, 12 and 13 include a sum, which makes the use of the logarithmic form more involved. To deal with this issue, we can define function \(lnAdd\), which takes two arguments \(x\) and \(y\) that are in logarithmic form and returns \(ln(e^x + e^y)\). It does so in a way that preserves a good deal of numerical precision that could be lost if \(ln(e^x + e^y)\) were calculated in a direct manner. This is done by using the following formula:
Now that we introduced functions \(lnGamma\) and \(lnAdd\), it is straightforward to evaluate Eqs. 2, 10, 11, 12, 13 and 16 in logarithmic form.
Let us now analyze the overall computational complexity for computing the Bayesian score for a specific rule (solving Eq. 1). Doing so requires computing \(Pr(M_e|G),\,Pr(M_h|G)\) and \(Pr(M_l|G)\). \(Pr(M_e|G)\) can be computed in \(O(1)\) using Eq. 2. \(Pr(M_h|G)\) can be computed by applying Eqs. 10, 11, 12 or 13. The computational complexity of these equations are \(O(N_{22}+\beta _2),\,O(N_{11}+\alpha _1),\,O(N_{21}+\alpha _2)\) and \(O(N_{12}+\beta _1)\), respectively. Therefore, \(Pr(M_h|G)\) can be computed in \(O(min(N_{11}+\alpha _1,N_{12}+\beta _1,N_{21}+\alpha _2,N_{22}+\beta _2))\). \(Pr(M_l|G)\) can be computed from \(Pr(M_h|G)\) in \(O(1)\) using Eq. 16. By assuming that \(\alpha _1,\,\beta _1,\,\alpha _2,\,\beta _2\) bounded from above, the overall complexity for computing the Bayesian score is \(\varvec{O(min(N_{11},N_{12},N_{21},N_{22}}))\).
Rights and permissions
About this article
Cite this article
Batal, I., Cooper, G.F., Fradkin, D. et al. An efficient pattern mining approach for event detection in multivariate temporal data. Knowl Inf Syst 46, 115–150 (2016). https://doi.org/10.1007/s10115-015-0819-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-015-0819-6