Skip to main content

pcStream: A Stream Clustering Algorithm for Dynamically Detecting and Managing Temporal Contexts

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9078))

Abstract

The clustering of unbounded data-streams is a difficult problem since the observed instances cannot be stored for future clustering decisions. Moreover, the probability distribution of streams tends to change over time, making it challenging to differentiate between a concept-drift and an anomaly. Although many excellent data-stream clustering algorithms have been proposed in the past, they are not suitable for capturing the temporal contexts of an entity.

In this paper, we propose pcStream; a novel data-stream clustering algorithm for dynamically detecting and managing sequential temporal contexts. pcStream takes into account the properties of sensor-fused data-streams in order to accurately infer the present concept, and dynamically detect new contexts as they occur. Moreover, the algorithm is capable of detecting point anomalies and can operate with high velocity data-streams. Lastly, we show in our evaluation that pcStream outperforms state-of-the-art stream clustering algorithms in detecting real world contexts from sensor-fused datasets. We also show how pcStream can be used as an analysis tool for contextual sensor streams.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C., et al.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases vol. 29, pp. 81–92. VLDB Endowment (2003)

    Google Scholar 

  2. Anguita, D., Ghio, A., Oneto, L., Parra, X., Reyes-Ortiz, J.L.: Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine. In: Bravo, J., Hervás, R., Rodríguez, M. (eds.) IWAAL 2012. LNCS, vol. 7657, pp. 216–223. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Babcock, B., et al.: Maintaining variance and k-medians over data stream windows. In: Proceedings of the Twenty-Second ACM Symposium On Principles Of Database Systems, pp. 234–243. ACM (2003)

    Google Scholar 

  4. Bache, K., Lichman, M.: UCI Machine Learning Repository (2013). http://archive.ics.uci.edu/ml

  5. Baldauf, M., et al.: A survey on context-aware systems. International Journal of Ad Hoc and Ubiquitous Computing. 2(4), 263–277 (2007)

    Article  Google Scholar 

  6. Bolanos, M., et al.: Introduction to stream: An extensible Framework for Data Stream Clustering Research with R

    Google Scholar 

  7. Bolanos, M., et al.: StreamMOA: Interface to Algorithms from MOA for stream

    Google Scholar 

  8. Cao, F., et al.: Density-based clustering over an evolving data stream with noise. In: SDM, pp. 326–337 SIAM (2006)

    Google Scholar 

  9. Chandola, V., et al.: Anomaly Detection: A Survey. ACM Comput. Surv. 41(3), 1–58 (2009)

    Article  MathSciNet  Google Scholar 

  10. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, pp. 133–142. ACM (2007)

    Google Scholar 

  11. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. In: Bazzan, A.L., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Ge, Z., Song, Z.: Multivariate Statistical Process Control: Process Monitoring Methods and Applications. Springer (2012)

    Google Scholar 

  13. Gomes, J.B., et al.: CALDS: context-aware learning from data streams. In: Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, pp. 16–24. ACM, Washington, D.C. (2010)

    Google Scholar 

  14. Harries, M.B., et al.: Extracting hidden context. Machine learning. 32(2), 101–126 (1998)

    Article  MATH  Google Scholar 

  15. Hubert, L., Arabie, P.: Comparing partitions. Journal of classification. 2(1), 193–218 (1985)

    Article  Google Scholar 

  16. Jolliffe, I.: Principal Component Analysis. Encyclopedia of Statistics in Behavioral Science. John Wiley & Sons, Ltd (2005)

    Google Scholar 

  17. Katakis, I., et al.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems. 22(3), 371–391 (2010)

    Article  Google Scholar 

  18. Liu, W., et al.: A survey on context awareness. In: 2011 International Conference on Computer Science and Service System (CSSS), pp. 144–147. IEEE (2011)

    Google Scholar 

  19. Maesschalck, R.D., et al.: The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems. 50(1), 1–18 (2000)

    Article  Google Scholar 

  20. Makris, P., et al.: A Survey on Context-Aware Mobile and Wireless Networking: On Networking and Computing Environments’ Integration. Communications Surveys & Tutorials, IEEE. 15(1), 362–386 (2013)

    Article  Google Scholar 

  21. Padovitz, A., et al.: Towards a theory of context spaces. In: 2004, Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, pp. 38–42. IEEE (2004)

    Google Scholar 

  22. Riboni, D., Bettini, C.: COSAR: hybrid reasoning for context-aware activity recognition. Personal and Ubiquitous Computing. 15(3), 271–289 (2011)

    Article  Google Scholar 

  23. Shlens, J.: A tutorial on principal component analysis. arXiv preprint arXiv:1404.1100 (2014)

    Google Scholar 

  24. Silva, J.A., et al.: Data Stream Clustering: A Survey. ACM Comput. Surv. 46(1), 1–31 (2013)

    Article  Google Scholar 

  25. Unger, M., et al.: Contexto: lessons learned from mobile context inference. In: ACM 2014 International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pp. 175–178. ACM (2014)

    Google Scholar 

  26. Widmer, G.: Tracking context changes through meta-learning. Machine Learning. 27(3), 259–286 (1997)

    Article  Google Scholar 

  27. Wold, S., Sjostrom, M.: SIMCA: a method for analyzing chemical data in terms of similarity and analogy. Presented at the (1977)

    Google Scholar 

  28. Yang, Y., et al.: Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams. Data Mining and Knowledge Discovery. 13(3), 261–289 (2006)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yisroel Mirsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Mirsky, Y., Shapira, B., Rokach, L., Elovici, Y. (2015). pcStream: A Stream Clustering Algorithm for Dynamically Detecting and Managing Temporal Contexts. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18032-8_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18031-1

  • Online ISBN: 978-3-319-18032-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics