Advertisement

Finding It Now: Networked Classifiers in Real-Time Stream Mining Systems

  • Raphael Ducasse
  • Cem Tekin
  • Mihaela van der Schaar
Chapter

Abstract

The aim of this chapter is to describe and optimize the specifications of signal processing systems, aimed at extracting in real time valuable information out of large-scale decentralized datasets. A first section will explain the motivations and stakes and describe key characteristics and challenges of stream mining applications. We then formalize an analytical framework which will be used to describe and optimize distributed stream mining knowledge extraction from large scale streams. In stream mining applications, classifiers are organized into a connected topology mapped onto a distributed infrastructure. We will study linear chains and optimise the ordering of the classifiers to increase accuracy of classification and minimise delay. We then present a decentralized decision framework for joint topology construction and local classifier configuration. In many cases, accuracy of classifiers are not known beforehand. In the last section, we look at how to learn online the classifiers characteristics without increasing computation overhead. Stream mining is an active field of research, at the crossing of various disciplines, including multimedia signal processing, distributed systems, machine learning etc. As such, we will indicate several areas for future research and development.

Notes

Acknowledgements

This work is based upon work supported by the National Science Foundation under Grant No. 1016081. We would like to thank Dr. Deepak Turaga (IBM Research) for introducing us to the topic of stream mining and for many productive conversation associated with the material of this chapter as well as providing us with Figs. 1 and 3 of this chapter. We also would like to thank Dr. Fangwen Fu and Dr. Brian Foo, who have been PhD students in Prof. van der Schaar group and have made contributions to the area of stream mining from which this chapter benefited. Finally, we thank Mr. Siming Song for kindly helping us with formatting and polishing the final version of the chapter.

References

  1. 1.
    Babcock, B., Babu, S., Datar, M., Motwani, R.: Chain: Operator scheduling for memory minimization in data stream systems. In: Proc. ACM International Conference on Management of Data (SIGMOD), pp. 253–264 (2003)Google Scholar
  2. 2.
    Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: ACM SIGMOD International Conference on Management of Data (2004)Google Scholar
  3. 3.
    Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press (1994)Google Scholar
  4. 4.
    Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: Proc. of Conference on Innovative Data Systems Research, Asilomar (2003)Google Scholar
  5. 5.
    Cherniack, M., Balakrishnan, H., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: Proc. CIDR (2003)Google Scholar
  6. 6.
    Condon, A., Deshpande, A., Hellerstein, L., Wu, N.: Flow algorithm for two pipelined filter ordering problems. In: ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2006)Google Scholar
  7. 7.
    Douglis, F., Branson, M., Hildrum, K., Rong, B., Ye, F.: Multi-site cooperative data stream analysis. ACM SIGOPS 40(3) (2006)CrossRefGoogle Scholar
  8. 8.
    Ducasse, R., Turaga, D.S., van der Schaar, M.: Adaptive topologic optimization for large-scale stream mining. IEEE Journal on Selected Topics in Signal Processing 4(3), 620–636 (2010)CrossRefGoogle Scholar
  9. 9.
    Foo, B., van der Schaar, M.: Distributed classifier chain optimization for real-time multimedia stream-mining systems. In: Proc. IS&T / SPIE Multimedia Content Access, Algorithms and Systems II (2008)Google Scholar
  10. 10.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Proc. European Conference on Computational Learning Theory, pp. 23–37 (1995)Google Scholar
  11. 11.
    Fu, F., Turaga, D.S., Verscheure, O., van der Schaar, M., Amini, L.: Configuring competing classifier chains in distributed stream mining systems. IEEE Journal on Selected Topics in Signal Processing (2007)Google Scholar
  12. 12.
    Gaber, M., Zaslavsky, A., Krishnaswamy, S.: Resource-aware knowledge discovery in data streams. In: Proc. First Intl. Workshop on Knowledge Discovery in Data Streams (2004)Google Scholar
  13. 13.
    Garg, A., Pavlovic, V.: Bayesian networks as ensemble of classifiers. In: Proc. 16th International Conference on Pattern Recognition (ICPR), pp. 779–784 (2002)Google Scholar
  14. 14.
    Gupta, A., Smith, K., Shalley, C.: The interplay between exploration and exploitation. Academy of Management Journal (2006)Google Scholar
  15. 15.
    Hu, J., Wellman, M.: Multiagent reinforcement learning: Theoretical framework and an algorithm. In: Proceedings of the Fifteenth International Conference on Machine Learning (1998)Google Scholar
  16. 16.
    Low, S., Lapsley, D.E.: Optimization flow control I: Basic algorithm and convergence. IEEE/ACM Trans. Networking 7(6), 861–874 (1999)CrossRefGoogle Scholar
  17. 17.
    Marden, J., Young, H., Arslan, G., Shamma, J.: Payoff based dynamics for multi-player weakly acyclic games. SIAM Journal on Control and Optimization, special issue on Control and Optimization in Cooperative Networks (2007)Google Scholar
  18. 18.
    Merugu, S., Ghosh, J.: Privacy-preserving distributed clustering using generative models. In: Proc. of 3rd International Conference on Management of Data, pp. 211–218 (2003)Google Scholar
  19. 19.
    Olston, C., Jiang, J., Widom, J.: Adaptive filters for continuous queries over distributed data streams. In: Proc. ACM SIGMOD Intl. Conf. Management of Data, pp. 563–574 (2003)Google Scholar
  20. 20.
    Palomar, D., Chiang, M.: On alternative decompositions and distributed algorithms for network utility problems. In: Proc. IEEE Globecom (2005)Google Scholar
  21. 21.
    Park, H., Turaga, D.S., Verscheure, O., van der Schaar, M.: Foresighted tree configuring games in resource constrained distributed stream mining systems. In: Proc. IEEE Int. Conf. Acoustics Speech and Signal Process. (2009)Google Scholar
  22. 22.
    Saul, L., Jordan, M.I.: Learning in Boltzmann trees. Neural Computation (1994)Google Scholar
  23. 23.
    Schapire, Y.: A brief introduction to boosting. In: Proc. International Conference on Algorithmic Learning Theory (1999)Google Scholar
  24. 24.
    Slivkins, A.: Contextual bandits with similarity information. Journal of Machine Learning Research 15(1), 2533–2568 (2014)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Tekin, C., van der Schaar, M.: Active learning in context-driven stream mining with an application to image mining. IEEE Transactions on Image Processing 24(11), 3666–3679 (2015)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Tekin, C., van der Schaar, M.: Contextual online learning for multimedia content aggregation. IEEE Transactions on Multimedia 17(4), 549–561 (2015)CrossRefGoogle Scholar
  27. 27.
    Tekin, C., van der Schaar, M.: Distributed online learning via cooperative contextual bandits. IEEE Transactions Signal Processing 63(14), 3700–3714 (2015)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Tekin, C., van der Schaar, M.: RELEAF: An algorithm for learning and exploiting relevance. IEEE Journal of Selected Topics in Signal Processing 9(4), 716–727 (2015)CrossRefGoogle Scholar
  29. 29.
    Tekin, C., Van Der Schaar, M.: Discovering, learning and exploiting relevance. In: Advances in Neural Information Processing Systems, pp. 1233–1241 (2014)Google Scholar
  30. 30.
    Tekin, C., Yoon, J., van der Schaar, M.: Adaptive ensemble learning with confidence bounds. IEEE Transactions on Signal Processing 65(4), 888–903 (2017)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Turaga, D., Verscheure, O., Chaudhari, U., Amini, L.: Resource management for networked classifiers in distributed stream mining systems. In: Proc. IEEE ICDM (2006)Google Scholar
  32. 32.
    Vaidya, J., Clifton, C.: Privacy-preserving k-means clustering over vertically partitioned data. In: Proc. of 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 206–215 (2003)Google Scholar
  33. 33.
    Varshney, P.: Distributed Detection and Data Fusion. Springer (1997). ISBN: 978-0-387-94712-9CrossRefGoogle Scholar
  34. 34.
    Vazirani, V.: Approximation Algorithms. Springer Verlag, Inc., New York, NY, USA (2001)zbMATHGoogle Scholar
  35. 35.
    Žliobaitė, I.: Learning under concept drift: an overview. arXiv preprint arXiv:1010.4784 (2010)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Raphael Ducasse
    • 1
  • Cem Tekin
    • 2
  • Mihaela van der Schaar
    • 3
    • 4
  1. 1.The Boston Consulting GroupBostonUSA
  2. 2.Bilkent UniversityAnkaraTurkey
  3. 3.Oxford-Man InstituteOxfordUK
  4. 4.University of California, Los AngelesLos AngelesUSA

Personalised recommendations