Advertisement

pcStream: A Stream Clustering Algorithm for Dynamically Detecting and Managing Temporal Contexts

  • Yisroel Mirsky
  • Bracha Shapira
  • Lior Rokach
  • Yuval Elovici
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9078)

Abstract

The clustering of unbounded data-streams is a difficult problem since the observed instances cannot be stored for future clustering decisions. Moreover, the probability distribution of streams tends to change over time, making it challenging to differentiate between a concept-drift and an anomaly. Although many excellent data-stream clustering algorithms have been proposed in the past, they are not suitable for capturing the temporal contexts of an entity.

In this paper, we propose pcStream; a novel data-stream clustering algorithm for dynamically detecting and managing sequential temporal contexts. pcStream takes into account the properties of sensor-fused data-streams in order to accurately infer the present concept, and dynamically detect new contexts as they occur. Moreover, the algorithm is capable of detecting point anomalies and can operate with high velocity data-streams. Lastly, we show in our evaluation that pcStream outperforms state-of-the-art stream clustering algorithms in detecting real world contexts from sensor-fused datasets. We also show how pcStream can be used as an analysis tool for contextual sensor streams.

Keywords

Stream clustering Concept detection Concept drift Context-awareness 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, C.C., et al.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases vol. 29, pp. 81–92. VLDB Endowment (2003)Google Scholar
  2. 2.
    Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine. In: Bravo, J., Hervás, R., Rodríguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Babcock, B., et al.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM Symposium On Principles Of Database Systems, pp. 234–243. ACM (2003)Google Scholar
  4. 4.
    Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml
  5. 5.
    Baldauf, M., et al.: A survey on context-aware systems. International Journal of Ad Hoc and Ubiquitous Computing. 2(4), 263–277 (2007)CrossRefGoogle Scholar
  6. 6.
    Bolanos, M., et al.: Introduction to stream: An extensible Framework for Data Stream Clustering Research with RGoogle Scholar
  7. 7.
    Bolanos, M., et al.: StreamMOA: Interface to Algorithms from MOA for streamGoogle Scholar
  8. 8.
    Cao, F., et al.: Density-based clustering over an evolving data stream with noise. In: SDM, pp. 326–337 SIAM (2006)Google Scholar
  9. 9.
    Chandola, V., et al.: Anomaly Detection: A Survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pp. 133–142. ACM (2007)Google Scholar
  11. 11.
    Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Ge, Z., Song, Z.: Multivariate Statistical Process Control: Process Monitoring Methods and Applications. Springer (2012)Google Scholar
  13. 13.
    Gomes, J.B., et al.: CALDS: context-aware learning from data streams. In: Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, pp. 16–24. ACM, Washington, D.C. (2010)Google Scholar
  14. 14.
    Harries, M.B., et al.: Extracting hidden context. Machine learning. 32(2), 101–126 (1998)CrossRefzbMATHGoogle Scholar
  15. 15.
    Hubert, L., Arabie, P.: Comparing partitions. Journal of classification. 2(1), 193–218 (1985)CrossRefGoogle Scholar
  16. 16.
    Jolliffe, I.: Principal Component Analysis. Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd (2005)Google Scholar
  17. 17.
    Katakis, I., et al.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems. 22(3), 371–391 (2010)CrossRefGoogle Scholar
  18. 18.
    Liu, W., et al.: A survey on context awareness. In: 2011 International Conference on Computer Science and Service System (CSSS), pp. 144–147. IEEE (2011)Google Scholar
  19. 19.
    Maesschalck, R.D., et al.: The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems. 50(1), 1–18 (2000)CrossRefGoogle Scholar
  20. 20.
    Makris, P., et al.: A Survey on Context-Aware Mobile and Wireless Networking: On Networking and Computing Environments’ Integration. Communications Surveys & Tutorials, IEEE. 15(1), 362–386 (2013)CrossRefGoogle Scholar
  21. 21.
    Padovitz, A., et al.: Towards a theory of context spaces. In: 2004, Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, pp. 38–42. IEEE (2004)Google Scholar
  22. 22.
    Riboni, D., Bettini, C.: COSAR: hybrid reasoning for context-aware activity recognition. Personal and Ubiquitous Computing. 15(3), 271–289 (2011)CrossRefGoogle Scholar
  23. 23.
    Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)Google Scholar
  24. 24.
    Silva, J.A., et al.: Data Stream Clustering: A Survey. ACM Comput. Surv. 46(1), 1–31 (2013)CrossRefGoogle Scholar
  25. 25.
    Unger, M., et al.: Contexto: lessons learned from mobile context inference. In: ACM 2014 International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 175–178. ACM (2014)Google Scholar
  26. 26.
    Widmer, G.: Tracking context changes through meta-learning. Machine Learning. 27(3), 259–286 (1997)CrossRefGoogle Scholar
  27. 27.
    Wold, S., Sjostrom, M.: SIMCA: a method for analyzing chemical data in terms of similarity and analogy. Presented at the (1977)Google Scholar
  28. 28.
    Yang, Y., et al.: Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams. Data Mining and Knowledge Discovery. 13(3), 261–289 (2006)CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yisroel Mirsky
    • 1
  • Bracha Shapira
    • 1
  • Lior Rokach
    • 1
  • Yuval Elovici
    • 1
  1. 1.Department of Information Systems EngineeringBen Gurion UniversityBe’er ShevaIsrael

Personalised recommendations