Control Considerations for Scalable Event Processing

  • Wei Xu
  • Joseph L. Hellerstein
  • Bill Kramer
  • David Patterson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3775)

Abstract

The growth in the scale of systems and networks has created many challenges for their management, especially for event processing. Our premise is that scaling event processing requires parallelism. To this end, we observe that event processing can be divided into intra-event processing such as filtering and inter-event processing such as root cause analysis. Since intra-event processing is easily parallelized, we propose an architecture in which intra-event processing elements (IAPs) are replicated to scale to larger event input rates. We address two challenges in this architecture. First, the IAPs are subject to overloads that require effective flow control, a capability that was not present in the components we used to build IAPs. Second, we need to balance the loads on IAPs to avoid creating resource bottlenecks. These challenges are further complicated by the presence of disturbances such as CPU intensive administrative tasks that reduce event processing rates. We address these challenges using designs based on control theory, a technique for analyzing stability, accuracy, and settling times. We demonstrate the effectiveness of our approaches with testbed experiments that include a disturbance in the form of a CPU intensive application.

References

  1. 1.
    Burns, L., Hellerstein, J.L., Ma, S., Perng, C.S., Rabenhorst, D.A., Taylor, D.: A systematic approach to discovering correlation rules for event management. In: IEEE/IFIP Integrated Network Management (May 2001)Google Scholar
  2. 2.
    Carzaniga, A., Rosenblum, D.S., Wolf, A.L.: Achieving scalability and expressiveness in an internet-scale event notification service. In: Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, Portland, Oregon, pp. 219–227 (July 2000)Google Scholar
  3. 3.
    Chen, M., Zheng, A., Lloyd, J., Jordan, M., Brewer, E.: A statistical learning approach to failure diagnosis. In: International Conference on Autonomic Computing (ICAC 2004), New York, NY (May 2004)Google Scholar
  4. 4.
    Chen, M.Y., Kiciman, E., Fratkin, E., Fox, A., Brewer, E.A.: Pinpoint: Problem determination in large, dynamic internet services. In: DSN, pp. 595–604 (2002)Google Scholar
  5. 5.
    Hewlett-Packard Development Company. Hp OpenView (2005), http://www.openview.hp.com/Google Scholar
  6. 6.
    IBM Corporation. Tivoli, http://www.ibm.com/software/tivoli/
  7. 7.
    Microsoft Corporation. Microsoft Operations Manager, http://www.microsoft.com/mom/
  8. 8.
    Diao, Y., Hellerstein, J.L., Storm, A., Surendra, M., Lightstone, S., Parekh, S., Garcia-Arellano, C.: Using MIMO Linear Control for Load Balancing in Computing Systems. In: American Control Conference, pp. 2045–2050 (June 2004)Google Scholar
  9. 9.
    Hellerstein, J.L., Diao, Y., Parekh, S., Tilbury, D.M.: Feedback Control of Computing Systems. Wiley-IEEE Press, Chichester (2004)CrossRefGoogle Scholar
  10. 10.
    Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. Journal of Computer Security 6(3), 151–180 (1998)CrossRefGoogle Scholar
  11. 11.
    Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the internet with PIER. In: Proceedings of the 29th VLDB Conference (2003)Google Scholar
  12. 12.
    Krishnamurthy, S., Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Madden, S., Reiss, F., Shah, M.A.: Telegraphcq: An architectural status report. IEEE Data Eng. Bull. 26(1), 11–18 (2003)Google Scholar
  13. 13.
    van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Transactions on Computer Systems 21(2), 164–206 (2003)CrossRefGoogle Scholar
  14. 14.
    Vilalta, R., Apté, C., Hellerstein, J.L., Ma, S., Weiss, S.M.: Predictive algorithms in the management of computer systems. IBM Systems Journal 41(3), 461–474 (2002)CrossRefGoogle Scholar
  15. 15.
    Yemini, S.A., Kliger, S., Mozes, E., Yemini, Y., Ohsie, D.: High speed and robust event correlation. IEEE Communications Magazine 34(5), 82–90 (1996)CrossRefGoogle Scholar

Copyright information

© IFIP International Federation for Information Processing 2005

Authors and Affiliations

  • Wei Xu
    • 1
  • Joseph L. Hellerstein
    • 2
  • Bill Kramer
    • 1
  • David Patterson
    • 1
  1. 1.Computer Science Dept.University of CaliforniaBerkeley
  2. 2.IBM T.J. Watson Research CenterHawthorneUSA

Personalised recommendations