Skip to main content

Advertisement

Log in

Discovering non-compliant window co-occurrence patterns

  • Published:
GeoInformatica Aims and scope Submit manuscript

Abstract

Given a set of trajectories annotated with measurements of physical variables, the problem of Non-compliant Window Co-occurrence (NWC) pattern discovery aims to determine temporal signatures in the explanatory variables which are highly associated with windows of undesirable behavior in a target variable. NWC discovery is important for societal applications such as eco-friendly transportation (e.g. identifying engine signatures leading to high greenhouse gas emissions). Challenges of designing a scalable algorithm for NWC discovery include the non-monotonicity of popular spatio-temporal statistical interest measures of association such as the cross-K function which renders the anti-monotone pruning based algorithms (e.g. Apriori) inapplicable for such interest measures. In our preliminary work, we proposed two upper bounds for the cross-K function and a top-down multi-parent tracking approach that uses these bounds for filtering out uninteresting candidate patterns and then applies a minimum support (i.e. frequency) threshold as a post-processing step to filter out chance patterns. In this paper, we propose a novel bi-directional pruning approach (BDNMiner) that combines top-down pruning based on the cross-K function threshold with bottom-up pruning based on the minimum support threshold to efficiently mine NWC patterns. Case studies with real world engine data demonstrates the ability of the proposed approach to discover patterns which are interesting to engine scientists. Experimental evaluation on real-world data show that the proposed approach yields substantial computational savings compared to prior work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Aggarwal CC, Bhuiyan MA, Hasan MA (2014) Frequent pattern mining algorithms: A survey. In: Frequent pattern mining, Springer

  2. Agrawal R, Srikant R, et al. (1994) Fast algorithms for mining association rules. In: Proc. 20th int. conf. very large data bases, VLDB, vol 1215, pp 487–499

  3. Ali RY, Gunturi VM, Kotz AJ, Shekhar S, Northrop WF (2015) Discovering non-compliant window co-occurrence patterns: A summary of results. In: Advances in Spatial and Temporal Databases. Springer, pp 391–410

  4. Assanis DN, Filipi ZS, Fiveland SB, Syrimis M (2003) A predictive ignition delay correlation under steady-state and transient operation of a direct injection diesel engine. J Eng Gas Turbines Power 125(2):450–457

  5. Cohen E, et al. (2001) Finding interesting associations without support pruning. IEEE Trans Knowl Data Eng 13(1):64–78

    Article  Google Scholar 

  6. Das G, et al. (1998) Rule discovery from time series. In: Proceedings of the ACM International Conference on Knowledge and Data Discovery, pp 16–22

  7. Daw CS, Finney CEA, Tracy ER (2003) A review of symbolic analysis of experimental data. Rev Sci Instrum 74(2):915–930

    Article  Google Scholar 

  8. DieselNet (2015) Heavy-Duty Onroad Engines. https://www.dieselnet.com/standards/us/hd.php

  9. Diggle PJ, Chetwynd AG, Häggkvist R, Morris SE (1995) Second-order analysis of space-time clustering. Stat Methods Med Res 4(2):124–136

    Article  Google Scholar 

  10. Dixon PM (2002) Ripley’s k function. Encyclopedia of environmetrics

  11. FH Administration (2014) Annual vehicle distance traveled in miles and related data - 2011. https://www.fhwa.dot.gov/policyinformation/statistics/2011/pdf/vm1.pdf

  12. Fang K, Li Z, Shenton A, Fuente D, Gao B (2015) Black box dynamic modeling of a gasoline engine for constrained model-based fuel economy optimization. Tech. rep., SAE Technical Paper

  13. Gabriel E, Diggle PJ (2009) Second-order analysis of inhomogeneous spatio-temporal point process data. Statistica Neerlandica 63(1):43–51

    Article  Google Scholar 

  14. Harms SK, Deogun JS (2004) Sequential association rule mining with time lags. J Intell Inf Syst 22(1):7–22

    Article  Google Scholar 

  15. Huang Y, et al. (2003) Mining confident co-location rules without a support threshold. In: Proceedings of the 2003 ACM symposium on Applied computing, pp 497–501

  16. Kotsiantis S, Kanellopoulos D (2006) Discretization techniques: a recent survey. GESTS Intl Trans on Comput Sci Eng 32(1):47–58

    Google Scholar 

  17. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Disc 15(2):107–144

    Article  Google Scholar 

  18. McIntosh T, Chawla S (2007) High confidence rule mining for microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 4(4):611–623

    Article  Google Scholar 

  19. McKibben B (2014) Climate change impacts in the united states: the third national climate assessment

  20. Misra C, et al. (2013) In-use nox emissions from model year 2010 and 2011 heavy-duty diesel engines equipped with aftertreatment devices. Environ Sci Tech 47 (14):7892–7898

    Article  Google Scholar 

  21. Office of Transportation & Air Quality (2014) Mpg: Label values vs. corporate average fuel economy (cafe) values label mpg

  22. Sacchi L, Larizza C, Combi C, Bellazzi R (2007) Data mining with temporal abstractions: learning rules from time series. Data Min Knowl Disc 15(2):217–247

    Article  Google Scholar 

  23. Schiermeier Q (2015) The science behind the volkswagen emissions scandal. Nature

  24. Schluter T, Conrad S (2011) About the analysis of time series with temporal association rule mining. In: IEEE Symposium on Computational Intelligence and Data Mining, pp 325–332

  25. Shen W, Wang J, Han J (2014) Sequential pattern mining. In: Frequent pattern mining. Springer

  26. Turns SR (2012) An Introduction to Combustion: Concepts and Applications, vol 287, 3rd. McGraw-hill, New York

  27. US Environmental Protection Agency (2014a) Ground level ozone health effects

  28. US Environmental Protection Agency (2014b) Inventory of U.S. Greenhouse Gas Emissions and Sinks: 1990 - 2012. https://www3.epa.gov/climatechange/ghgemissions/usinventoryreport.html

  29. US Government Publishing Office (2015) 40 cfr ch. u section 1036.108. http://goo.gl/fg5NyV

  30. Vijayaraghavan K, et al. (2012) Effects of light duty gasoline vehicle emission standards in the United States on ozone and particulate matter. Atmos Environ 60:109–120

    Article  Google Scholar 

  31. Wang J, He QP (2010) Multivariate statistical process monitoring based on statistics pattern analysis. Ind Eng Chem Res 49(17):7858–7869

    Article  Google Scholar 

  32. Wikipedia (2015) Sudden unintended acceleration. https://goo.gl/OvMi6w

  33. Zakaria W, Kotb Y, Ghaleb F (2014) Mcr-miner: Maximal confident association rules miner algorithm for up/down-expressed genes. Appl Math 8(2):799–809

    Google Scholar 

Download references

Acknowledgments

This material is based upon work supported by the National Science Foundation under Grant No. 1029711, IIS-1320580, 0940818 and IIS-1218168, the USDOD under Grant No. HM1582-08-1-0017 and HM0210-13-1-0005, Ford University Research Program (URP), and the University of Minnesota under the OVPR U-Spatial and Minnesota Supercomputing Institute (MSI) (www.msi.umn.edu).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reem Y. Ali.

Appendix A: Proofs of preliminary results section

Appendix A: Proofs of preliminary results section

Lemma 1

Given an NWC pattern C and a time lag δ, \(Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)\) is an upper bound of \(|C \overset {\delta }{\bowtie } W_{N}|\)

Proof

For every NWC pattern {S i } consisting of a single event-sequence where {S i }⊆ C, 1≤iD i m(C), we have |{S i }|≥|C|, where |{S i }| and |C| are the cardinality of the patterns {S i } and C in the time series, respectively. Since {S i }⊆ C, we also have \(|\{S_{i}\} \overset {\delta }{\bowtie } W_{N}| \geq |C \overset {\delta }{\bowtie } W_{N}|\), ∀ 1 ≤iD i m(C). Then, \(Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)\) = \(\min \limits _{\{S_{i}\} \in C, 1\leq i \leq Dim(C)}(|\{S_{i}\} \overset {\delta }{\bowtie } W_{N}|) \geq |C \overset {\delta }{\bowtie } W_{N}|\). □

Lemma 2

Given an NWC pattern C and a time lag δ, Lower(|C|) is a lower bound of |C|.

Proof

Any superset pattern of C has a cardinality smaller than or equal to C. Therefore, Lower(|C|) = |superset(C)|≤|C|. □

Theorem 1

Given an NWC pattern C and a time lag δ, \(UB_{local}(\hat {K}_{C,W_{N}}(\delta ))\) is an upper bound of \(\hat {K}_{C,W_{N}}(\delta )\).

Proof

Using Lemmas 1 and 2, we have \(\hat {K}_{C,W_{N}}(\delta ) = \frac {T}{|W_{N}|} \times \frac {|C \overset {\delta }{\bowtie } W_{N}|}{|C|} \leq \frac {T}{|W_{N}|} \times \frac {Upper_{Loc}(|C \overset {\delta }{\bowtie } W_{N}|)}{Lower|C|} = UB_{local}(\hat {K}_{C,W_{N}}(\delta ))\)

Theorem 2

Given an NWC pattern C and a time lag δ, \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\) is an upper bound of \(\hat {K}_{C,W_{N}}(\delta )\).

Proof

Since Definition 8 differs from Definition 7 only in the min term being replaced by a max term, the proof of Theorem 2 is straightforward from Theorem 1. □

Lemma 3

Given an NWC pattern C and a time lag δ, \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\) is monotonically decreasing with decreasing Dim(C) if Lower(|C|) is kept monotonically increasing. In other words, given two NWC patterns C and C’ where C’ ⊂ C, then if Lower(|C |)≥Lower(|C|), then \(UB_{lattice}(\hat {K}_{C^{\prime },W_{N}}(\delta )) \leq UB_{lattice}(\hat {K}_{C,W_{N}}(\delta ))\).

Proof

Let C ⊂ C where C and C are two NWC patterns. Then ∀S i (v i )∈ C , where 1≤i≤ Dim(C ), we have S i (v i )∈ C. Therefore, \(Upper_{Lat}(|C^{\prime } \overset {\delta }{\bowtie } W_{N}|) = \max \limits _{\{S_{i}\} \in C^{\prime }, 1\leq i \leq Dim(C^{\prime })}(|{S_{i}} \overset {\delta }{\bowtie } W_{N}|) \leq \max \limits _{\{S_{i}\} \in C, 1\leq i \leq Dim(C)}(|{S_{i}} \overset {\delta }{\bowtie } W_{N}|) = Upper_{Lat}(|C \overset {\delta }{\bowtie } W_{N}|)\) (A). Also, since L o w e r(|C|) is kept monotonically increasing as Dim(C) decreases, then L o w e r(|C |)≥L o w e r(|C|) (B). From (A) and (B), we have \(UB_{lattice}(\hat {K}_{C,W_{N}}(\delta )) = \frac {T}{|W_{N}|} \times \frac {Upper_{Lat}(|C \overset {\delta }{\bowtie } W_{N}|)}{Lower(|C|)} \geq \frac {T}{|W_{N}|} \times \frac {Upper_{Lat}(|C^{\prime } \overset {\delta }{\bowtie } W_{N}|)}{Lower(|C^{\prime }|)}\) = \(UB_{lattice}(\hat {K}_{C^{\prime },W_{N}}(\delta ))\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ali, R.Y., Gunturi, V.M.V., Kotz, A.J. et al. Discovering non-compliant window co-occurrence patterns. Geoinformatica 21, 829–866 (2017). https://doi.org/10.1007/s10707-016-0289-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10707-016-0289-3

Keywords

Navigation