Skip to main content

Challenges in Learning from Streaming Data Extended Abstract

  • Conference paper
ICT Innovations 2014 (ICT Innovations 2014)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 311))

Included in the following conference series:

  • 934 Accesses

Abstract

Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 34–42 (2008)

    Google Scholar 

  2. Bifet, A., Gavaldà, R.: Adaptive XML tree classification on evolving data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 147–162. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  3. Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (2000)

    Google Scholar 

  4. Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from heterogeneous data. Knowledge and Information Systems Journal 6(2), 164–187 (2004)

    Article  Google Scholar 

  5. Gaber, M., Yu, P.S.: A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In: ACM Symposium Applied Computing, pp. 649–656. ACM Press (2006)

    Google Scholar 

  6. Medhat, M., Gaber, M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the Second Workshop on Australasian Information Security, pp. 109–114. Australian Computer Society, Inc. (2004)

    Google Scholar 

  7. Gama, J.: Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman & Hall CRC Press, Atlanta (2010)

    Book  MATH  Google Scholar 

  8. Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD, pp. 329–338 (2009)

    Google Scholar 

  9. Hulten, G., Domingos, P.: Catching up with the data: research issues in mining data streams. In: Proc. of Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, USA (2001)

    Google Scholar 

  10. Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining: Next Generation Challenges and Future Directions. AAAI Press and MIT Press (2004)

    Google Scholar 

  11. Kargupta, H., Park, B.H.: Mining decision trees from data streams in a mobile environment. In: IEEE International Conference on Data Mining, pp. 281–288. IEEE Computer Society, San Jose (2001)

    Google Scholar 

  12. Kargupta, H., Park, B.H., Dutta, H.: Orthogonal decision trees. IEEE Transactions on Knowledge and Data Engineering 18, 1028–1042 (2006)

    Article  Google Scholar 

  13. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, Toronto (2004)

    Google Scholar 

  14. Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo (1993)

    Google Scholar 

  15. Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Transactions Database Systems 32(4), 301–312 (2007)

    Article  Google Scholar 

  16. Wald, A.: Sequential Analysis. John Wiley and Sons, Inc. (1947)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to João Gama .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gama, J. (2015). Challenges in Learning from Streaming Data Extended Abstract. In: Bogdanova, A., Gjorgjevikj, D. (eds) ICT Innovations 2014. ICT Innovations 2014. Advances in Intelligent Systems and Computing, vol 311. Springer, Cham. https://doi.org/10.1007/978-3-319-09879-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09879-1_1

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09878-4

  • Online ISBN: 978-3-319-09879-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics