Skip to main content
Log in

Ensemble anomaly detection from multi-resolution trajectory features

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

The numerical, sequential observation of behaviors, such as trajectories, have become an important subject for data mining and knowledge discovery research. Processing the raw observation into representative features of the behaviors involves an implicit choice of time-scale and resolution, which critically affect the final output of the mining techniques. The choice is associated with the parameters of data-processing, e.g., smoothing and segmentation, which unintuitively yet strongly influence the intrinsic structure of the numerical data. Data mining techniques generally require users to provide an appropriately processed input, but selecting a resolution is an arduous task that may require an expensive, manual examination of outputs between different settings. In this paper, we propose a novel ensemble framework for aggregating outcomes in different settings of scale and resolution parameters for an anomaly detection task. Such a task is difficult for existing ensemble approaches based on weighted combination because: (a) evaluating and weighing an output requires training samples of anomalies which are generally unavailable, (b) the detectability of anomalies can depend on the resolution, i.e., the distinction from normal instances may only be apparent within a small, selective range of parameters. In the proposed framework, predictions based on different resolutions are aggregated to construct meta-feature representations of the behavior instances. The meta-features provide the discriminative information for conducting a clustering-based anomaly detection. In the proposed framework, two interrelated tasks of the behavior analysis: processing the numerical data and discovering anomalous patterns, are addressed jointly, providing an intuitive alternative for a knowledge-intensive parameter selection. We also design an efficient clustering-based anomaly detection algorithm which reduces the computational burden of mining at multiple resolutions. We conduct an empirical study of the proposed framework using real-world trajectory data. It shows that the proposed framework achieves a significant improvement over the conventional ensemble approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. http://www.japan-robotech.com.

  2. The dataset is a property of JST-ANR project (http://www.i.kyushu-u.ac.jp/~suzuki/jstanr.htm) and available from the project web page.

  3. http://bcb.cs.tufts.edu/frac/.

  4. http://keis.ei.st.gunma-u.ac.jp/~ando/expdat_dami.html.

References

  • Ailon N, Charikar M, Newman A (2008) Aggregating inconsistent information: ranking and clustering. J ACM 55, 23:1–23:27

    Google Scholar 

  • Ando S, Thanomphongphan T, Hoshino D, Seki Y, Suzuki E (2011) ACE: anomaly clustering ensemble for multi-perspective anomaly detection in robot behaviors. In: Proceedings of the tenth SIAM international conference on data mining, pp 1–12

  • Angiulli F, Basta S, Pizzuti C (2006) Distance-based detection and prediction of outliers. IEEE Trans Knowl Data Eng 18(2):145–160

    Article  Google Scholar 

  • Angiulli F, Fassetti F (2010) Distance-based outlier queries in data streams: the novel task and algorithms. Data Min Knowl Discov 20(2):290–324

    Article  MathSciNet  Google Scholar 

  • Anjum N, Cavallaro A (2008) Multifeature object trajectory clustering for video analysis. IEEE Trans Circuits Syst Video Technol 18(11):1555–1564

    Article  Google Scholar 

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms. Society for Industrial and Applied Mathematics, Philadelphia, pp 1027–1035

  • Bache K, Lichman M (2012) UCI machine learning repository. University of California, Irvine, School of Information and Computer Science. http://archive.ics.uci.edu/ml. Accessed Mar 2012

  • Banerjee A, Langford J (2004) An objective evaluation criterion for clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 515–520

  • Blanchard G, Lee G, Scott C (2010) Semi-supervised novelty detection. J Mach Learn Res 11:2973–3009

    MATH  MathSciNet  Google Scholar 

  • Bonchi F, Castillo C, Donato D, Gionis A (2009) Taxonomy-driven lumping for sequence mining. Data Min Knowl Discov 19(2):227–244

    Article  MathSciNet  Google Scholar 

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. SIGMOD Rec 29(2):93–104

    Article  Google Scholar 

  • Bu Y, Chen L, Fu AWC, Liu D (2009) Efficient anomaly monitoring over moving object trajectory streams. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 159–168

  • Budhaditya S, Pham DS, Lazarescu M, Venkatesh S (2009) Effective anomaly detection in sensor networks data streams. In: Proceedings of the 2009 ninth IEEE international conference on data mining, ICDM’09. IEEE Computer Society, Washington, DC, pp 722–727

  • Castro N, Azevedo PJ (2010) Multiresolution motif discovery in time series. In: Proceedings of tenth SIAM international conference on data mining. SIAM, pp 665–676

  • Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):1–58

    Article  Google Scholar 

  • Chiu B, Keogh E, Lonardi S (2003) Probabilistic discovery of time series motifs. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 493–498

  • Cotofrei P, Stoffel K (2002) Classification rules + time = temporal rules. In: Proceedings of the international conference on computational science-Part I. Springer-Verlag, London, pp 572–581

  • Daubechies I (1992) Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia

    Book  MATH  Google Scholar 

  • Dereszynski E, Dietterich T (2007) Probabilistic models for anomaly detection in remote sensor data streams. In: Proceedings of the twenty-third conference annual conference on uncertainty in artificial intelligence, UAI-07. AUAI Press, Corvallis, pp 75–82

  • Dietterich TG (2000) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems. Springer-Verlag, London, pp 1–15

  • Ester M, Kriegel HP, Sander Jö, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). AAAI Press, Portland, pp 226–231

  • Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the twenty-first international conference on machine learning. ACM, New York, pp 36–43

  • Fraley C, Raftery AE (1998) How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput J 41(8):578–588

    Article  MATH  Google Scholar 

  • Freire A, Barreto G, Veloso M, Varela A (2009) Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: Proceedings of the 6th Latin American Robotics, Symposium (LARS2009), pp 1–6

  • Gaffney S, Smyth P (1999) Trajectory clustering with mixtures of regression models. In: Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 63–72

  • Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Discov 16(3):349–364

    Article  MathSciNet  Google Scholar 

  • Giannotti F, Nanni M, Pinelli F, Pedreschi D (2007) Trajectory pattern mining. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 330–339

  • Gionis A, Mannila H, Tsaparas P (2007) Clustering aggregation. ACM Trans Knowl Discov Data 1(1):1–30

    Article  Google Scholar 

  • Han J, Lee JG, Gonzalez H, Li X (2008) Mining massive RFID, trajectory, and traffic data sets (Tutorial). In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York

  • Hido S, Tsuboi Y, Kashima H, Sugiyama M, Kanamori T (2011) Statistical outlier detection using direct density ratio estimation. Knowl Inf Syst 26:309–336

    Article  Google Scholar 

  • Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20(8):832–844

    Article  Google Scholar 

  • Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jiang S, Ferreira J, Gonzälez M (2012) Clustering daily patterns of human activities in the city. Data Min Knowl Discov 25:478–510

    Article  MATH  MathSciNet  Google Scholar 

  • Johnson N, Hogg D (1995) Learning the distribution of object trajectories for event recognition. In: Proceedings of the sixth british conference on machine vision B, vol 2. BMVA Press, Surrey, pp 583–592

  • Keogh E, Lin J (2005) Clustering of time-series subsequences is meaningless: implications for previous and future research. Knowl Inf Syst 8(2):154–177

    Article  Google Scholar 

  • Keogh E, Lin J, Fu A (2005) HOT SAX: efficiently finding the most unusual time series subsequence. In: Proceedings of the fifth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp 226–233

  • Khalid S, Naftel A (2005) Classifying spatiotemporal object trajectories using unsupervised learning of basis function coefficients. In: Proceedings of the third ACM international workshop on video surveillance & sensor networks. ACM, New York, pp 45–52

  • Khalid S, Naftel A (2006) Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space. Multimed Syst 12(3):227–238

    Article  Google Scholar 

  • Kim S, Cho NW, Kang B, Kang SH (2011) Fast outlier detection for very large log data. Expert Syst Appl 38(8):9587–9596

    Article  Google Scholar 

  • Knorr EM, Ng RT, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3–4):237–253

    Article  Google Scholar 

  • Kröger T (2010) On-line trajectory generation in robotic systems, springer tracts in advanced robotics, vol 58. Springer, Berlin

    Book  Google Scholar 

  • Kumar S, Nguyen HT, Suzuki E (2010) Understanding the behaviour of reactive robots in a patrol task by analysing their trajectories. In: Proceedings of the 2010 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, vol 02, WI-IAT’10. IEEE Computer Society, Washington, DC, pp 56–63

  • Lazarevic, A., Kumar V (2005) Feature bagging for outlier detection. In: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining. ACM Press, New York, pp 157–166

  • Lee JG, Han J, Li X (2008) Trajectory outlier detection: a partition-and-detect framework. In: Proceedings of the 2008 IEEE 24th international conference on data engineering, ICDM’08. IEEE Computer Society, Washington, DC, pp 140–149

  • Lehmann EL (2006) Nonparametrics: statistical methods based on ranks (revised edition). Springer, New York

    Google Scholar 

  • Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    Article  MathSciNet  Google Scholar 

  • Liu Z, Yu JX, Chen L, Wu D (2008) Detection of shape anomalies: a probabilistic approach using hidden markov models. In: Proceedings of the 2008 IEEE 24th international conference on data engineering. IEEE Computer Society, Washington, DC, pp 1325–1327

  • Markou M, Singh S (2003a) Novelty detection: a review—part 1: statistical approaches. Signal Process 83:2481–2497

    Google Scholar 

  • Markou M, Singh S (2003b) Novelty detection: a review—part 2: neural network based approaches. Signal Process 83:2499–2521

    Google Scholar 

  • Markou M, Singh S (2006) A neural network-based novelty detector for image sequence analysis. IEEE Trans Pattern Anal Mach Intell 28(10):1664–1677

    Article  Google Scholar 

  • Morris B, Trivedi M (2008) Learning, modeling, and classification of vehicle track patterns from live video. IEEE Trans Intell Transp Syst 9(3):425–437

    Article  Google Scholar 

  • Morris B, Trivedi M (2009) Learning trajectory patterns by clustering: experimental studies and comparative evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 312–319

  • Morris BT, Trivedi MM (2008) A survey of vision-based trajectory learning and analysis for surveillance. IEEE Trans Circuits Syst Video Technol 18(8):1114–1127

    Article  Google Scholar 

  • Nguyen HV, Ang HH, Gopalkrishnan V (2010) Mining outliers with ensemble of heterogeneous detectors on random subspaces. In: Proceedings of the 15th international conference on database systems for advanced applications, DASFAA’10, vol I. Springer, Berlin, pp 368–383

  • Noto K, Brodley C, Slonim D (2012) FRaC: a feature-modeling approach for semi-supervised and unsupervised anomaly detection. Data Min Knowl Discov 25(1):109–133

    Article  MathSciNet  Google Scholar 

  • Pelekis N, Kopanakis I, Kotsifakos EE, Frentzos E, Theodoridis Y (2011) Clustering uncertain trajectories. Knowl Inf Syst 28(1):117–147

    Article  Google Scholar 

  • Pham DT, Chan AB (1998) Control chart pattern recognition using a new type of self-organizing neural network. In: Proceedings of the institution of mechanical engineers, part I. J Syst Control Eng 212(2):115–127

  • Piciarelli C, Foresti GL (2006) On-line trajectory clustering for anomalous events detection. Pattern Recogn Lett 27(15):1835–1842

    Article  Google Scholar 

  • Piciarelli C, Foresti GL (2007) Anomalous trajectory detection using support vector machines. In: Proceedings of the 2007 IEEE conference on advanced video and signal based surveillance. IEEE Computer Society, Washington, DC, pp 153–158

  • Piciarelli C, Micheloni C, Foresti G (2008) Trajectory-based anomalous event detection. IEEE Trans Circuits Syst Video Technol 18(11):1544–1554

    Article  Google Scholar 

  • Porikli F, Haga T (2004) Event detection by eigenvector decomposition using object and frame features. In: Conference on computer vision and pattern recognition workshop, 2004. CVPRW ’04, p 114

  • Roddick JF, Spiliopoulou M (2002) A survey of temporal knowledge discovery paradigms and methods. IEEE Trans Knowl Data Eng 14:750–767

    Article  Google Scholar 

  • Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39

    Article  Google Scholar 

  • Rosswog J, Ghose K (2012) Detecting and tracking coordinated groups in dense, systematically moving, crowds. In: Proceedings of the twelfth SIAM international conference on data mining, pp 1–11

  • Saito N (1994) Local feature extraction and its applications using a library of bases. Ph.D. Thesis, Yale University, New Haven

  • Steland A (1998) Bootstrapping rank statistics. Metrika 47:251–264

    Article  MATH  MathSciNet  Google Scholar 

  • Strehl A, Ghosh J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MATH  MathSciNet  Google Scholar 

  • Suzuki N, Hirasawa K, Tanaka K, Kobayashi Y, Sato Y, Fujino Y (2007) Learning motion patterns and anomaly detection by human trajectory analysis. In: IEEE international conference on systems, man and cybernetics, ISIC2007, pp 498–503

  • Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Density-based clustering of data streams at multiple resolutions. ACM Trans Knowl Discov Data 3, 14:1–14:28

    Google Scholar 

  • Wang Q, Megalooikonomou V, Faloutsos C (2010) Time series analysis with multiple resolutions. Inf Syst 35(1):56–74

    Article  Google Scholar 

  • Wang X, Li G, Jiang G, Shi Z (2011) Semantic trajectory-based event detection and event pattern mining. Knowl Inf Syst. doi:10.1007/s10115-011-0471-8

  • Webb A, Copsey K (2011) Statistical pattern recognition. Wiley, New York

    Book  MATH  Google Scholar 

  • Williams BH, Toussaint M, Storkey AJ (2007) A primitive based generative model to infer timing information in unpartitioned handwriting data. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07. Morgan Kaufmann Publishers Inc., San Francisco, pp 1119–1124

  • Xiong Y, Yeung DY (2002) Mixtures of ARMA models for model-based time series clustering. In: Proceedings of the 2002 IEEE international conference on data mining. IEEE Computer Society, Washington, DC, pp. 717–720

  • Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Article  Google Scholar 

  • Yamanishi K, Takeuchi J, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300

    Article  MathSciNet  Google Scholar 

  • Yang Q (2009) Activity recognition: linking low-level sensors to high-level intelligence. In: Proceedings of the 21st international joint conference on artifical intelligence, IJCAI’09. Morgan Kaufmann Publishers Inc., San Francisco, CA, pp 20–25

  • Yang Y, Chen K (2011) Temporal data clustering via weighted clustering ensemble with different representations. IEEE Trans Knowl Data Eng 23:307–320

    Article  Google Scholar 

  • Yankov D, Keogh E, Rebbapragada U (2008) Disk-aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl Inf Syst 17(2):241–262

    Article  Google Scholar 

  • Zheng Y, Zhou X (2011) Computing with spatial trajectories, 1st edn. Springer Publishing Company, Incorporated, New York

    Book  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the handling editor and the anonymous reviewers for their valuable comments and suggestions. Parts of this study are supported by the Strategic International Cooperative Program funded by Japan Science and Technology Agency and the Grant-in-Aid for Scientific Research on Fundamental Research (B) 21300053, (B) 25280085, and (C) 22500119 by the Japanese Ministry of Education, Culture, Sports, Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shin Ando.

Additional information

Responsible editor: M.J. Zaki.

This paper extends work previously published as Ando et al. (2011).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 23 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ando, S., Thanomphongphan, T., Seki, Y. et al. Ensemble anomaly detection from multi-resolution trajectory features. Data Min Knowl Disc 29, 39–83 (2015). https://doi.org/10.1007/s10618-013-0334-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0334-x

Keywords

Navigation