Skip to main content

A Context-Sensitive Framework for Mining Concept Drifting Data Streams

  • Chapter
  • First Online:
Predictive Maintenance in Dynamic Systems

Abstract

In this chapter, we present the staged learning approach to classification in a non-stationary stream of data. Unlike the standard data stream mining paradigm that assumes change is always present, the staged approach senses the level of volatility in the stream and adjusts the mode of learning accordingly. We propose a scheme whereby volatility could be measured and construct a volatility detector that senses the stream. We model the data stream as consisting of two states: a high-volatility state and a low-volatility state, with transitions taking place to/from these states depending on the level of volatility in the stream. In segments of high volatility an ensemble of online classifiers is used for learning, whereas in low volatility maximum utilization is made of past concepts which are encoded by compact versions of Fourier spectra. The staged approach results in improvements in accuracy as well as throughput while reducing memory usage as demonstrated by our experimentation on a wide range of real-world and synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    From http://moa.cms.waikato.ac.nz/.

  2. 2.

    From http://moa.cms.waikato.ac.nz/datasets/.

  3. 3.

    From https://c3.nasa.gov/dashlink/resources/.

References

  1. Baena-García, M., Campo-Ávila, J., Fidalgo-Merino, R., Bifet, A., Gavald, R., Bueno, R.: Early drift detection method. In: In Fourth International Workshop on Knowledge Discovery from Data Streams, pp. 77–86 (2006)

    Google Scholar 

  2. Bifet, A., Gavald, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp. 443–448 (2007). https://doi.org/10.1137/1.9781611972771.42

  3. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’09, pp. 139–148. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1557019.1557041

  4. Bifet, A., Holmes, G., Pfahringer, B.: Leveraging Bagging for Evolving Data Streams, pp. 135–150. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-15880-3_15

    Chapter  Google Scholar 

  5. Candanedo, L.M., Feldheim, V.: Accurate occupancy detection of an office room from light, temperature, humidity and CO2 measurements using statistical learning models. Energy Build. 112, 28–39 (2016). https://doi.org/10.1016/j.enbuild.2015.11.071. http://www.sciencedirect.com/science/article/pii/S0378778815304357

    Article  Google Scholar 

  6. Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006). http://dl.acm.org/citation.cfm?id=1248547.1248548

    MathSciNet  MATH  Google Scholar 

  7. Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014). https://doi.org/10.1145/2523813

    Article  Google Scholar 

  8. Gao, J., Fan, W., Han, J., Yu, P.S.: A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions, pp. 3–14 (2007). https://doi.org/10.1137/1.9781611972771.1

  9. Gomes, H.M., Bifet, A., Read, J., Barddal, J.P., Enembreck, F., Pfharinger, B., Holmes, G., Abdessalem, T.: Adaptive random forests for evolving data stream classification. Mach. Learn. 106(9), 1469–1495 (2017). https://doi.org/10.1007/s10994-017-5642-8

    Article  MathSciNet  Google Scholar 

  10. Hoeglinger, S., Pears, R., Koh, Y.S.: CBDT: A Concept Based Approach to Data Stream Mining, pp. 1006–1012. Springer, Berlin (2009). https://doi.org/10.1007/978-3-642-01307-2_107

    Chapter  Google Scholar 

  11. Kargupta, H., Park, B.H.: A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments. IEEE Trans. Knowl. Data Eng. 16(2), 216–229 (2004). https://doi.org/10.1109/TKDE.2004.1269599

    Article  Google Scholar 

  12. Kargupta, H., Park, B.H., Dutta, H.: Orthogonal decision trees. IEEE Trans. Knowl. Data Eng. 18(8), 1028–1042 (2006). https://doi.org/10.1109/TKDE.2006.127

    Article  Google Scholar 

  13. Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’99, pp. 367–371. ACM, New York, NY, USA (1999). https://doi.org/10.1145/312129.312285

  14. Kithulgoda, C.I., Pears, R.: Staged online learning: a new approach to classification in high speed data streams. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2016). https://doi.org/10.1109/IJCNN.2016.7727173

  15. Kithulgoda, C.I., Pears, R., Naeem, M.A.: The incremental Fourier classifier: leveraging the discrete Fourier transform for classifying high speed data streams. Expert Syst. Appl. 97, 1–17 (2018). https://doi.org/10.1016/j.eswa.2017.12.023

    Article  Google Scholar 

  16. Kleinberg, J.: Bursty and hierarchical structure in streams. Data Min. Knowl. Discov. 7(4), 373–397 (2003). https://doi.org/10.1023/A:1024940629314.

    Article  MathSciNet  Google Scholar 

  17. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  18. Park, B.H.: Knowledge discovery from heterogeneous data streams using Fourier spectrum of decision trees. Ph.D. thesis, Washington State University, Pullman, WA, USA (2001)

    Google Scholar 

  19. Pears, R., Sakthithasan, S., Koh, Y.S.: Detecting concept change in dynamic data streams. Mach. Learn. 97(3), 259–293 (2014). https://doi.org/10.1007/s10994-013-5433-9

    Article  MathSciNet  Google Scholar 

  20. Ramamurthy, S., Bhatnagar, R.: Tracking recurrent concept drift in streaming data using ensemble classifiers. In: Sixth International Conference on Machine Learning and Applications (ICMLA 2007), pp. 404–409 (2007). https://doi.org/10.1109/ICMLA.2007.80

  21. Ross, G.J., Adams, N.M., Tasoulis, D.K., Hand, D.J.: Exponentially weighted moving average charts for detecting concept drift. Pattern Recogn. Lett. 33(2), 191–198 (2012). https://doi.org/10.1016/j.patrec.2011.08.019

    Article  Google Scholar 

  22. Sakthithasan, S., Pears, R., Bifet, A., Pfahringer, B.: Use of ensembles of Fourier spectra in capturing recurrent concepts in data streams. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2015). https://doi.org/10.1109/IJCNN.2015.7280583

  23. Sripirakas, S., Pears, R.: Mining Recurrent Concepts in Data Streams Using the Discrete Fourier Transform, pp. 439–451. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-10160-6-39

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chamari I. Kithulgoda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kithulgoda, C.I., Pears, R. (2019). A Context-Sensitive Framework for Mining Concept Drifting Data Streams. In: Lughofer, E., Sayed-Mouchaweh, M. (eds) Predictive Maintenance in Dynamic Systems. Springer, Cham. https://doi.org/10.1007/978-3-030-05645-2_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05645-2_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05644-5

  • Online ISBN: 978-3-030-05645-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics