, Volume 62, Issue 1–2, pp 258–308 | Cite as

Mapping Filtering Streaming Applications

  • Kunal Agrawal
  • Anne Benoit
  • Fanny DufosséEmail author
  • Yves Robert


In this paper, we explore the complexity of mapping filtering streaming applications on large-scale homogeneous and heterogeneous platforms, with a particular emphasis on communication models and their impact. Filtering applications are streaming applications where each node also has a selectivity which either increases or decreases the size of its input data set. This selectivity makes the problem of scheduling these applications more challenging than the more studied problem of scheduling “non-filtering” streaming workflows. We address the complexity of the following two problems:
  • Evaluation: Given a mapping of nodes to processors, how can one compute the period and latency?

  • Optimization: Given a filtering workflow, how can one compute the mapping and schedule that minimize the period or latency? A solution to this problem requires generating both the mapping and the associated operation list—the order in which each processor executes its assigned tasks.

We address this general problem in two steps. First, we address the simplified model without communication cost. In this case, the evaluation problems are easy, and the optimization problems have polynomial complexity on homogeneous platforms. However, we show that the optimization problems become NP-hard on heterogeneous platforms. Second, we consider platforms with communication costs. Clearly, due to the previous results, the optimization problems on heterogeneous platforms are still NP-hard. Therefore we come back to homogeneous platforms and extend the framework with three significant realistic communication models. Now even evaluation problems become difficult, because the mapping must now be enriched with an operation list that provides the time-steps at which each computation and each communication occurs in the system: determining the best operation list has a combinatorial nature. Not too surprisingly, optimization problems are NP-hard too. Altogether, this paper provides a comprehensive overview of the additional difficulties induced by heterogeneity and communication costs.


Query optimization Web service Streaming application Workflow Communication model Period Latency Complexity results 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agnetis, A., Detti, P., Pranzo, M., Sodhi, M.S.: Sequencing unreliable jobs on parallel machines. J. Sched. 12(1), 45–54 (2008). Available on-line at CrossRefMathSciNetGoogle Scholar
  2. 2.
    Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: SIGMOD’04: Proceedings of the 2004 ACM SIGMOD Int. Conf. on Management of Data, pp. 407–418. ACM, New York (2004) CrossRefGoogle Scholar
  3. 3.
    Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008) CrossRefGoogle Scholar
  4. 4.
    Benoit, A., Dufossé, F., Robert, Y.: Filter placement on a pipelined architecture. In: 11th Workshop on Advances in Parallel and Distributed Computational Models APDCM 2009. IEEE Computer Society, Los Alamitos (2009) Google Scholar
  5. 5.
    Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. J. Parallel Distrib. Comput. 63, 251–263 (2003) CrossRefzbMATHGoogle Scholar
  6. 6.
    Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Research Report 2005-40, Stanford University, November 2005 Google Scholar
  7. 7.
    Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999) CrossRefGoogle Scholar
  8. 8.
    DataCutter Project: Middleware for Filtering Large Archival Scientific Datasets in a Grid Environment.
  9. 9.
    Florescu, D., Grunhagen, A., Kossmann, D.: Xl: A platform for web services. In: CIDR 2003, First Biennial Conference on Innovative Data Systems Research, 2003. On-line proceedings at
  10. 10.
    Garey, M.R., Johnson, D.S.: Computers and Intractability, a Guide to the Theory of NP-Completeness. Freeman, New York (1979) zbMATHGoogle Scholar
  11. 11.
    Hellerstein, J.M.: Predicate migration: optimizing queries with expensive predicates. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 267–276 (1993) Google Scholar
  12. 12.
    Hong, B., Prasanna, V.: Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput. In: Proceedings of the 32th International Conference on Parallel Processing, ICPP’2003. IEEE Computer Society, Los Alamitos (2003) Google Scholar
  13. 13.
    Ouzzani, M., Bouguettaya, A.: Query processing and optimization on the web. Distrib. Parallel Databases 15(3), 187–218 (2004) CrossRefGoogle Scholar
  14. 14.
    Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the Complete Reference. MIT Press, Cambridge (1996) Google Scholar
  15. 15.
    Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 355–366. VLDB Endowment (2006) Google Scholar
  16. 16.
    Taura, K., Chien, A.A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Heterogeneous Computing Workshop, pp. 102–115. IEEE Computer Society, Los Alamitos (2000) Google Scholar
  17. 17.
    Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: Toward optimizing latency under throughput constraints for application workflows on clusters. In: Euro-Par’07. LNCS, vol. 4641, pp. 173–183. Springer, Berlin (2007) Google Scholar
  18. 18.
    Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: A duplication based algorithm for optimizing latency under throughput constraints for streaming workflows. In: ICPP’2008, the International Conference on Parallel Processing, pp. 254–261. IEEE Computer Society, Los Alamitos (2008) Google Scholar
  19. 19.
    Wu, Q., Gu, Y.: Supporting distributed application workflows in heterogeneous computing environments. In: 14th International Conference on Parallel and Distributed Systems, ICPADS. IEEE Computer Society, Los Alamitos (2008) Google Scholar
  20. 20.
    Wu, Q., Gao, J., Zhu, M., Rao, N., Huang, J., Iyengar, S.: On optimal resource utilization for distributed remote visualization. IEEE Trans. Comput. 57(1), 55–68 (2008) CrossRefMathSciNetGoogle Scholar
  21. 21.
    Yu, W.: The two-machine flow shop problem with delays and the one-machine total tardiness problem. PhD Thesis, Technishe Universiteit Eidhoven, June 1996 Google Scholar
  22. 22.
    Yu, W., Hoogeveen, H., Lenstra, J.K.: Minimizing makespan in a two-machine flow shop with delays and unit-time operations is NP-hard. J. Sched. 7(5), 333–348 (2004) CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Kunal Agrawal
    • 1
  • Anne Benoit
    • 2
  • Fanny Dufossé
    • 2
    Email author
  • Yves Robert
    • 2
  1. 1.Washington University in St. LouisSt. LouisUSA
  2. 2.ENS Lyon et Université de LyonLyon CedexFrance

Personalised recommendations