Challenges in Learning from Streaming Data Extended Abstract

Gama, João

doi:10.1007/978-3-319-09879-1_1

João Gama^4,5

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 311))

Included in the following conference series:

International Conference on ICT Innovations

934 Accesses

Abstract

Machine learning studies automatic methods for acquisition of domain knowledge with the goal of improving systems performance as the result of experience. In the past two decades, machine learning research and practice has focused on batch learning usually with small data sets. The rationale behind this practice is that examples are generated at random accordingly to some stationary probability distribution. Most learners use a greedy, hill-climbing search in the space of models. They are prone to overfitting, local maximas, etc. Data are scarce and statistic estimates have high variance. A paradigmatic example is the TDIT algorithm to learn decision trees [14]. As the tree grows, less and fewer examples are available to compute the sufficient statistics, variance increase leading to model instability Moreover, the growing process re-uses the same data, exacerbating the overfitting problem. Regularization and pruning mechanisms are mandatory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bifet, A., Gavaldà, R.: Mining adaptively frequent closed unlabeled rooted trees in data streams. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 34–42 (2008)
Google Scholar
Bifet, A., Gavaldà, R.: Adaptive XML tree classification on evolving data streams. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part I. LNCS, vol. 5781, pp. 147–162. Springer, Heidelberg (2009)
Chapter Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (2000)
Google Scholar
Chen, R., Sivakumar, K., Kargupta, H.: Collective mining of Bayesian networks from heterogeneous data. Knowledge and Information Systems Journal 6(2), 164–187 (2004)
Article Google Scholar
Gaber, M., Yu, P.S.: A framework for resource-aware knowledge discovery in data streams: a holistic approach with its application to clustering. In: ACM Symposium Applied Computing, pp. 649–656. ACM Press (2006)
Google Scholar
Medhat, M., Gaber, M., Krishnaswamy, S., Zaslavsky, A.: Cost-efficient mining techniques for data streams. In: Proceedings of the Second Workshop on Australasian Information Security, pp. 109–114. Australian Computer Society, Inc. (2004)
Google Scholar
Gama, J.: Knowledge Discovery from Data Streams. Data Mining and Knowledge Discovery. Chapman & Hall CRC Press, Atlanta (2010)
Book MATH Google Scholar
Gama, J., Sebastião, R., Rodrigues, P.P.: Issues in evaluation of stream learning algorithms. In: KDD, pp. 329–338 (2009)
Google Scholar
Hulten, G., Domingos, P.: Catching up with the data: research issues in mining data streams. In: Proc. of Workshop on Research Issues in Data Mining and Knowledge Discovery, Santa Barbara, USA (2001)
Google Scholar
Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y.: Data Mining: Next Generation Challenges and Future Directions. AAAI Press and MIT Press (2004)
Google Scholar
Kargupta, H., Park, B.H.: Mining decision trees from data streams in a mobile environment. In: IEEE International Conference on Data Mining, pp. 281–288. IEEE Computer Society, San Jose (2001)
Google Scholar
Kargupta, H., Park, B.H., Dutta, H.: Orthogonal decision trees. IEEE Transactions on Knowledge and Data Engineering 18, 1028–1042 (2006)
Article Google Scholar
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 180–191. Morgan Kaufmann, Toronto (2004)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., San Mateo (1993)
Google Scholar
Sharfman, I., Schuster, A., Keren, D.: A geometric approach to monitoring threshold functions over distributed data streams. ACM Transactions Database Systems 32(4), 301–312 (2007)
Article Google Scholar
Wald, A.: Sequential Analysis. John Wiley and Sons, Inc. (1947)
Google Scholar

Download references

Author information

Authors and Affiliations

LIAAD-INESC TEC, University of Porto, Porto, Portugal
João Gama
Faculty of Economics, University Porto, Porto, Portugal
João Gama

Authors

João Gama
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to João Gama .

Editor information

Editors and Affiliations

Faculty of Computer Science and Engineering, Ss Cyril and Methodius University, Skopje, Macedonia
Ana Madevska Bogdanova
Faculty of Computer Science and Engineering, Ss Cyril and Methodius University, Skopje, Macedonia
Dejan Gjorgjevikj

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gama, J. (2015). Challenges in Learning from Streaming Data Extended Abstract. In: Bogdanova, A., Gjorgjevikj, D. (eds) ICT Innovations 2014. ICT Innovations 2014. Advances in Intelligent Systems and Computing, vol 311. Springer, Cham. https://doi.org/10.1007/978-3-319-09879-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-319-09879-1_1
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09878-4
Online ISBN: 978-3-319-09879-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics