Distributed and Parallel Databases

, Volume 31, Issue 4, pp 543–599 | Cite as

Automatic optimization of stream programs via source program operator graph transformations

  • Miyuru Dayarathna
  • Toyotaro Suzumura


Distributed data stream processing is a data analysis paradigm where massive amounts of data produced by various sources are analyzed online within real-time constraints. Execution performance of a stream program/query executed on such middleware is largely dependent on the ability of the programmer to fine tune the program to match the topology of the stream processing system. However, manual fine tuning of a stream program is a very difficult, error prone process that demands huge amounts of programmer time and expertise which are expensive to obtain. We describe an automated process for stream program performance optimization that uses semantic preserving automatic code transformation to improve stream processing job performance. We first identify the structure of the input program and represent the program structure in a Directed Acyclic Graph. We transform the graph using the concepts of Tri-OP Transformation and Bi-Op Transformation. The resulting sample program space is pruned using both empirical as well as profiling information to obtain a ranked list of sample programs which have higher performance compared to their parent program. We successfully implemented this methodology on a prototype stream program performance optimization mechanism called Hirundo. The mechanism has been developed for optimizing SPADE programs which run on System S stream processing run-time. Using five real world applications (called VWAP, CDR, Twitter, Apnoea, and Bargain) we show the effectiveness of our approach. Hirundo was able to identify a 31.1 times higher performance version of the CDR application within seven minutes time on a cluster of 4 nodes.


Stream processing Performance optimization Code transformation Data-intensive computing Automatic tuning 



This research was supported by the Japan Science and Technology Agency’s CREST project titled “Development of System Software Technologies for post-Peta Scale High Performance Computing”.


  1. 1.
    Ahmed, R., Lee, A., Witkowski, A., Das, D., Su, H., Zait, M., Cruanes, T.: Cost-based query transformation in oracle. In: VLDB ’06, pp. 1026–1036 (2006) Google Scholar
  2. 2.
    Aho, A.V., Ullman, J.D.: Node listings for reducible flow graphs. In: Proceedings of Seventh Annual ACM Symposium on Theory of Computing (STOC ’75), pp. 177–185. ACM, New York (1975) CrossRefGoogle Scholar
  3. 3.
    Akram, S., Marazakis, M., Bilas, A.: Understanding and improving the cost of scaling distributed event processing. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS ’12), pp. 290–301. ACM, New York (2012) CrossRefGoogle Scholar
  4. 4.
    Andrade, H., Gedik, B., Wu, K.-L., Yu, P.S.: Scale-up strategies for processing high-rate data streams in systems. In: IEEE 25th International Conference on Data Engineering (ICDE ’09), 29 March 2009–2 April 2009, pp. 1375–1378 (2009) CrossRefGoogle Scholar
  5. 5.
    Appel, A.W.: Modern Compiler Implementation in Java. Cambridge University Press, Cambridge (2002) zbMATHCrossRefGoogle Scholar
  6. 6.
    Babu, S.: Towards automatic optimization of mapreduce programs. In: SoCC ’10, pp. 137–142 (2010) CrossRefGoogle Scholar
  7. 7.
    Backman, N., Fonseca, R., Çetintemel, U.: Managing parallelism for stream processing in the cloud. In: Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing (HotCDP ’12), pp. 1:1–1:5. ACM, New York (2012) Google Scholar
  8. 8.
    Ballard, C., et al.: IBM Infosphere Streams: Harnessing Data in Motion. IBM (2010) Google Scholar
  9. 9.
    Banerjee, P., Chandy, J.A., Gupta, M., Hodges, E.W. IV, Holm, J.G., Lain, A., Palermo, D.J., Ramaswamy, S., Su, E.: The paradigm compiler for distributed-memory multicomputers. Computer 28, 37–47 (1995) CrossRefGoogle Scholar
  10. 10.
    Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2, 1366–1377 (2009) Google Scholar
  11. 11.
    Biem, A., Elmegreen, B., Verscheure, O., Turaga, D., Andrade, H., Cornwell, T.: A streaming approach to radio astronomy imaging. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), March 2010, pp. 1654–1657 (2010) CrossRefGoogle Scholar
  12. 12.
    Biem, A., Bouillet, E., Feng, H., Ranganathan, A., Riabov, A., Verscheure, O., Koutsopoulos, H., Moran, C.: IBM infosphere streams for scalable, real-time, intelligent transportation services. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD ’10), pp. 1093–1104. ACM, New York (2010) CrossRefGoogle Scholar
  13. 13.
    Blount, M., Ebling, M.R., Eklund, J.M., James, A.G., McGregor, C., Percival, N., Smith, K.P., Sow, D.: Real-time analysis for intensive care: development and deployment of the Artemis analytic system. IEEE Eng. Med. Biol. Mag. 29(2), 110–118 (2010) CrossRefGoogle Scholar
  14. 14.
    Bouillet, E., Kothari, R., Kumar, V., Mignet, L., Nathan, S., Ranganathan, A., Turaga, D.S., Udrea, O., Verscheure, O.: Processing 6 billion cdrs/day: from research to production. Experience report. In: Proceedings of the 6th ACM International Conference on Distributed Event-Based Systems (DEBS ’12), pp. 264–267. ACM, New York (2012) CrossRefGoogle Scholar
  15. 15.
    Catley, C., Smith, K., McGregor, C., James, A., Eklund, J.M.: A framework to model and translate clinical rules to support complex real-time analysis of physiological and clinical data. In: Proceedings of the 1st ACM International Health Informatics Symposium (IHI ’10), pp. 307–315. ACM, New York (2010) Google Scholar
  16. 16.
    Chapman, B.M., Herbeck, H., Zima, H.P.: Automatic support for data distribution. In: DMCC, May, pp. 51–58 (1991) Google Scholar
  17. 17.
    Cook, D.: Gold parsing system. URL: December (2011)
  18. 18.
    Dave, C., Eigenmann, R.: Automatically tuning parallel and parallelized programs. In: Proceedings of the 22nd International Conference on Languages and Compilers for Parallel Computing (LCPC ’09), pp. 126–139. Springer, Berlin (2010) CrossRefGoogle Scholar
  19. 19.
    Dayarathna, M., Suzumura, T.: Hirundo: a mechanism for automated production of optimized data stream graphs. In: Proceedings of the Third Joint WOSP/SIPEW International Conference on Performance Engineering (ICPE ’12), pp. 335–346. ACM, New York (2012) CrossRefGoogle Scholar
  20. 20.
    Dayarathna, M., Suzumura, T.: A mechanism for stream program performance recovery in resource limited compute clusters. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds.) Database Systems for Advanced Applications. Lecture Notes in Computer Science, vol. 7826, pp. 164–178. Springer, Berlin (2013) CrossRefGoogle Scholar
  21. 21.
    Dennis, J.: Data flow graphs. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 512–518. Springer, New York (2011) Google Scholar
  22. 22.
    Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, New York (2002) zbMATHGoogle Scholar
  23. 23.
    Gedik, B., Andrade, H., Wu, K.-L., Yu, P.S., Doo, M.: Spade: the system s declarative stream processing engine. In: SIGMOD ’08, pp. 1123–1134 (2008) CrossRefGoogle Scholar
  24. 24.
    Gedik, B., Andrade, H., Wu, K.-L.: A code generation approach to optimizing high-performance distributed data stream processing. In: CIKM ’09, pp. 847–856 (2009) CrossRefGoogle Scholar
  25. 25.
    Hall, M., Chame, J., Chen, C., Shin, J., Rudy, G., Khan, M.: Loop transformation recipes for code generation and auto-tuning. In: Languages and Compilers for Parallel Computing, pp. 50–64 (2010) CrossRefGoogle Scholar
  26. 26.
    Herodotou, H., Borisov, N., Babu, S.: Query optimization techniques for partitioned tables. In: SIGMOD ’11, pp. 49–60 (2011) Google Scholar
  27. 27.
    Hill, M., Campbell, M., Chang, Y.-C., Iyengar, V.: Event detection in sensor networks for modern oil fields. In: Proceedings of the Second International Conference on Distributed Event-Based Systems (DEBS ’08), pp. 95–102. ACM, New York (2008) CrossRefGoogle Scholar
  28. 28.
    Hirzel, M., Andrade, H., Gedik, B., Kumar, V., Losa, G., Mendell, M., Nasgaard, H., Soule, R., Wu, K.-L.: Spl stream processing language specification. November (2009) Google Scholar
  29. 29.
    IBM: IBM infosphere streams version 1.2: programming model and language reference. February (2010) Google Scholar
  30. 30.
    Kabra, N., DeWitt, D.J.: Efficient mid-query re-optimization of sub-optimal query execution plans. In: SIGMOD ’98, pp. 106–117 (1998) CrossRefGoogle Scholar
  31. 31.
    Karcher, T., Pankratius, V.: Run-time automatic performance tuning for multicore applications. In: Proceedings of the 17th International Conference on Parallel Processing, Part I (Euro-Par ’11), pp. 3–14. Springer, Berlin (2011) Google Scholar
  32. 32.
    Kasyanov, V.N., Evstigneev, V.A.: Graph Theory for Programmers, Algorithms for Processing Trees. Kluwer Academic, Norwell (2000) zbMATHCrossRefGoogle Scholar
  33. 33.
    Khandekar, R., Hildrum, K., Parekh, S., Rajan, D., Wolf, J., Wu, K.-L., Andrade, H., Gedik, B.: Cola: Optimizing stream processing applications via graph partitioning. In: Middleware 2009, pp. 308–327 (2009) CrossRefGoogle Scholar
  34. 34.
    Langdon, W.B., Poli, R.: Foundations of Genetic Programming. Springer, New York (2002) zbMATHCrossRefGoogle Scholar
  35. 35.
    Liew, C.S., Atkinson, M.P., van Hemert, J.I., Han, L.: Towards optimising distributed data streaming graphs using parallel streams. In: HPDC ’10, pp. 725–736. ACM, New York (2010) Google Scholar
  36. 36.
    Marsland, S.: Machine Learning: An Algorithmic Perspective. Chapman & Hall/CRC, London (2009) Google Scholar
  37. 37.
    Metwally, A., Agrawal, D., El Abbadi, A.: Duplicate detection in click streams. In: Proceedings of the 14th International Conference on World Wide Web (WWW ’05), pp. 12–21. ACM, New York (2005) CrossRefGoogle Scholar
  38. 38.
    Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: KDCloud 2010, December 2010 (2010) Google Scholar
  39. 39.
    Palermo, D., Hodges, E., Banerjee, P.: Compiler optimization of dynamic data distributions for distributed-memory multicomputers. In: Compiler Optimizations for Scalable Parallel Systems, vol. 1808, pp. 445–484 (2001) CrossRefGoogle Scholar
  40. 40.
    Park, Y., King, R., Nathan, S., Most, W., Andrade, H.: Evaluation of a high-volume, low-latency market data processing system implemented with IBM middleware. Softw. Pract. Exp. 42(1), 37–56 (2012) CrossRefGoogle Scholar
  41. 41.
    Qin, J., Fahringer, T., Prodan, R.: A novel graph based approach for automatic composition of high quality grid workflows. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing (HPDC ’09), pp. 167–176. ACM, New York (2009) CrossRefGoogle Scholar
  42. 42.
    Scipy: scientific tools for python. URL:, July (2011)
  43. 43.
    Skiena, S.S.: The Algorithm Design Manual, 2nd edn. Springer, Berlin (2008) zbMATHCrossRefGoogle Scholar
  44. 44.
    Sodhi, S., Subhlok, J., Xu, Q.: Performance prediction with skeletons. Clust. Comput. 11, 151–165 (2008) CrossRefGoogle Scholar
  45. 45.
    Suzumura, T., Yasue, T., Onodera, T.: Scalable performance of systems for extract-transform-load processing. In: SYSTOR ’10, pp. 7:1–7:14 (2010) Google Scholar
  46. 46.
    Twitter. #numbers. URL: (2011)
  47. 47.
    Wang, Z., O’Boyle, M.F.P.: Partitioning streaming parallelism for multi-cores: a machine learning based approach. In: PACT ’10, pp. 307–318 (2010) CrossRefGoogle Scholar
  48. 48.
    Yaikhom, G., Liew, C., Han, L., van Hemert, J., Atkinson, M., Krause, A.: Federated enactment of workflow patterns. In: Euro-Par 2010—Parallel Processing, vol. 6271, pp. 317–328 (2010) Google Scholar
  49. 49.
    Yang, L.T., Ma, X., Mueller, F.: Cross-platform performance prediction of parallel applications using partial execution. In: SC ’05, Washington, DC, USA (2005) Google Scholar
  50. 50.
    Fetterly, D., Yu, Y., Isard, M., Budiu, M.: Dryadlinq: a system for general-purpose distributed data-parallel computing using a high-level language. In: OSDI ’08, pp. 1–14 (2008) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceTokyo Institute of TechnologyTokyoJapan
  2. 2.Department of Computer ScienceTokyo Institute of Technology/IBM Research-TokyoTokyoJapan

Personalised recommendations