Abstract
Given a set of trajectories annotated with measurements of physical variables, the problem of Non-compliant Window Co-occurrence (NWC) pattern discovery aims to determine temporal signatures in the explanatory variables which are highly associated with windows of undesirable behavior in a target variable. NWC discovery is important for societal applications such as eco-friendly transportation (e.g. identifying engine signatures leading to high greenhouse gas emissions). Challenges of designing a scalable algorithm for NWC discovery include the non-monotonicity of popular spatio-temporal statistical interest measures of association such as the cross-K function which renders the anti-monotone pruning based algorithms (e.g. Apriori) inapplicable for such interest measures. In our preliminary work, we proposed two upper bounds for the cross-K function and a top-down multi-parent tracking approach that uses these bounds for filtering out uninteresting candidate patterns and then applies a minimum support (i.e. frequency) threshold as a post-processing step to filter out chance patterns. In this paper, we propose a novel bi-directional pruning approach (BDNMiner) that combines top-down pruning based on the cross-K function threshold with bottom-up pruning based on the minimum support threshold to efficiently mine NWC patterns. Case studies with real world engine data demonstrates the ability of the proposed approach to discover patterns which are interesting to engine scientists. Experimental evaluation on real-world data show that the proposed approach yields substantial computational savings compared to prior work.
Similar content being viewed by others
References
Aggarwal CC, Bhuiyan MA, Hasan MA (2014) Frequent pattern mining algorithms: A survey. In: Frequent pattern mining, Springer
Agrawal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215, pp 487–499
Ali RY, Gunturi VM, Kotz AJ, Shekhar S, Northrop WF (2015) Discovering non-compliant window co-occurrence patterns: A summary of results. In: Advances in Spatial and Temporal Databases. Springer, pp 391–410
Assanis DN, Filipi ZS, Fiveland SB, Syrimis M (2003) A predictive ignition delay correlation under steady-state and transient operation of a direct injection diesel engine. J Eng Gas Turbines Power 125(2):450–457
Cohen E, et al. (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78
Das G, et al. (1998) Rule discovery from time series. In: Proceedings of the ACM International Conference on Knowledge and Data Discovery, pp 16–22
Daw CS, Finney CEA, Tracy ER (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2):915–930
DieselNet (2015) Heavy-Duty Onroad Engines. https://www.dieselnet.com/standards/us/hd.php
Diggle PJ, Chetwynd AG, Häggkvist R, Morris SE (1995) Second-order analysis of space-time clustering. Stat Methods Med Res 4(2):124–136
Dixon PM (2002) Ripley’s k function. Encyclopedia of environmetrics
FH Administration (2014) Annual vehicle distance traveled in miles and related data - 2011. https://www.fhwa.dot.gov/policyinformation/statistics/2011/pdf/vm1.pdf
Fang K, Li Z, Shenton A, Fuente D, Gao B (2015) Black box dynamic modeling of a gasoline engine for constrained model-based fuel economy optimization. Tech. rep., SAE Technical Paper
Gabriel E, Diggle PJ (2009) Second-order analysis of inhomogeneous spatio-temporal point process data. Statistica Neerlandica 63(1):43–51
Harms SK, Deogun JS (2004) Sequential association rule mining with time lags. J Intell Inf Syst 22(1):7–22
Huang Y, et al. (2003) Mining confident co-location rules without a support threshold. In: Proceedings of the 2003 ACM symposium on Applied computing, pp 497–501
Kotsiantis S, Kanellopoulos D (2006) Discretization techniques: a recent survey. GESTS Intl Trans on Comput Sci Eng 32(1):47–58
Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144
McIntosh T, Chawla S (2007) High confidence rule mining for microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 4(4):611–623
McKibben B (2014) Climate change impacts in the united states: the third national climate assessment
Misra C, et al. (2013) In-use nox emissions from model year 2010 and 2011 heavy-duty diesel engines equipped with aftertreatment devices. Environ Sci Tech 47 (14):7892–7898
Office of Transportation & Air Quality (2014) Mpg: Label values vs. corporate average fuel economy (cafe) values label mpg
Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Disc 15(2):217–247
Schiermeier Q (2015) The science behind the volkswagen emissions scandal. Nature
Schluter T, Conrad S (2011) About the analysis of time series with temporal association rule mining. In: IEEE Symposium on Computational Intelligence and Data Mining, pp 325–332
Shen W, Wang J, Han J (2014) Sequential pattern mining. In: Frequent pattern mining. Springer
Turns SR (2012) An Introduction to Combustion: Concepts and Applications, vol 287, 3rd. McGraw-hill, New York
US Environmental Protection Agency (2014a) Ground level ozone health effects
US Environmental Protection Agency (2014b) Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990 - 2012. https://www3.epa.gov/climatechange/ghgemissions/usinventoryreport.html
US Government Publishing Office (2015) 40 cfr ch. u section 1036.108. http://goo.gl/fg5NyV
Vijayaraghavan K, et al. (2012) Effects of light duty gasoline vehicle emission standards in the United States on ozone and particulate matter. Atmos Environ 60:109–120
Wang J, He QP (2010) Multivariate statistical process monitoring based on statistics pattern analysis. Ind Eng Chem Res 49(17):7858–7869
Wikipedia (2015) Sudden unintended acceleration. https://goo.gl/OvMi6w
Zakaria W, Kotb Y, Ghaleb F (2014) Mcr-miner: Maximal confident association rules miner algorithm for up/down-expressed genes. Appl Math 8(2):799–809
Acknowledgments
This material is based upon work supported by the National Science Foundation under Grant No. 1029711, IIS-1320580, 0940818 and IIS-1218168, the USDOD under Grant No. HM1582-08-1-0017 and HM0210-13-1-0005, Ford University Research Program (URP), and the University of Minnesota under the OVPR U-Spatial and Minnesota Supercomputing Institute (MSI) (www.msi.umn.edu).
Author information
Authors and Affiliations
Corresponding author
Appendix A: Proofs of preliminary results section
Appendix A: Proofs of preliminary results section
Lemma 1
Given an NWC pattern C and a time lag δ, \(Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)\) is an upper bound of \(|C \overset {\delta }{\bowtie } W_{N}|\)
Proof
For every NWC pattern {S i } consisting of a single event-sequence where {S i }⊆ C, 1≤i≤D i m(C), we have |{S i }|≥|C|, where |{S i }| and |C| are the cardinality of the patterns {S i } and C in the time series, respectively. Since {S i }⊆ C, we also have \(|\{S_{i}\} \overset {\delta }{\bowtie } W_{N}| \geq |C \overset {\delta }{\bowtie } W_{N}|\), ∀ 1 ≤i≤D i m(C). Then, \(Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)\) = \(\min \limits _{\{S_{i}\} \in C, 1\leq i \leq Dim(C)}(|\{S_{i}\} \overset {\delta }{\bowtie } W_{N}|) \geq |C \overset {\delta }{\bowtie } W_{N}|\). □
Lemma 2
Given an NWC pattern C and a time lag δ, Lower(|C|) is a lower bound of |C|.
Proof
Any superset pattern of C has a cardinality smaller than or equal to C. Therefore, Lower(|C|) = |superset(C)|≤|C|. □
Theorem 1
Given an NWC pattern C and a time lag δ, \(UB_{local}(\hat {K}_{C,W_{N}}(\delta ))\) is an upper bound of \(\hat {K}_{C,W_{N}}(\delta )\).
Proof
Using Lemmas 1 and 2, we have \(\hat {K}_{C,W_{N}}(\delta ) = \frac {T}{|W_{N}|} \times \frac {|C \overset {\delta }{\bowtie } W_{N}|}{|C|} \leq \frac {T}{|W_{N}|} \times \frac {Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)}{Lower|C|} = UB_{local}(\hat {K}_{C,W_{N}}(\delta ))\) □
Theorem 2
Given an NWC pattern C and a time lag δ, \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\) is an upper bound of \(\hat {K}_{C,W_{N}}(\delta )\).
Proof
Since Definition 8 differs from Definition 7 only in the min term being replaced by a max term, the proof of Theorem 2 is straightforward from Theorem 1. □
Lemma 3
Given an NWC pattern C and a time lag δ, \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\) is monotonically decreasing with decreasing Dim(C) if Lower(|C|) is kept monotonically increasing. In other words, given two NWC patterns C and C’ where C’ ⊂ C, then if Lower(|C ′ |)≥Lower(|C|), then \(UB_{lattice}(\hat {K}_{C^{\prime },W_{N}}(\delta )) \leq UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\).
Proof
Let C ′⊂ C where C and C ′ are two NWC patterns. Then ∀S i (v i )∈ C ′, where 1≤i≤ Dim(C ′), we have S i (v i )∈ C. Therefore, \(Upper_{Lat}(|C^{\prime } \overset {\delta }{\bowtie } W_{N}|) = \max \limits _{\{S_{i}\} \in C^{\prime }, 1\leq i \leq Dim(C^{\prime })}(|{S_{i}} \overset {\delta }{\bowtie } W_{N}|) \leq \max \limits _{\{S_{i}\} \in C, 1\leq i \leq Dim(C)}(|{S_{i}} \overset {\delta }{\bowtie } W_{N}|) = Upper_{Lat}(|C \overset {\delta }{\bowtie } W_{N}|)\) (A). Also, since L o w e r(|C|) is kept monotonically increasing as Dim(C) decreases, then L o w e r(|C ′|)≥L o w e r(|C|) (B). From (A) and (B), we have \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta )) = \frac {T}{|W_{N}|} \times \frac {Upper_{Lat}(|C \overset {\delta }{\bowtie } W_{N}|)}{Lower(|C|)} \geq \frac {T}{|W_{N}|} \times \frac {Upper_{Lat}(|C^{\prime } \overset {\delta }{\bowtie } W_{N}|)}{Lower(|C^{\prime }|)}\) = \(UB_{lattice}(\hat {K}_{C^{\prime },W_{N}}(\delta ))\) □
Rights and permissions
About this article
Cite this article
Ali, R.Y., Gunturi, V.M.V., Kotz, A.J. et al. Discovering non-compliant window co-occurrence patterns. Geoinformatica 21, 829–866 (2017). https://doi.org/10.1007/s10707-016-0289-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-016-0289-3