Skip to main content
Log in

Mapping Filtering Streaming Applications

  • Published:
Algorithmica Aims and scope Submit manuscript

Abstract

In this paper, we explore the complexity of mapping filtering streaming applications on large-scale homogeneous and heterogeneous platforms, with a particular emphasis on communication models and their impact. Filtering applications are streaming applications where each node also has a selectivity which either increases or decreases the size of its input data set. This selectivity makes the problem of scheduling these applications more challenging than the more studied problem of scheduling “non-filtering” streaming workflows. We address the complexity of the following two problems:

  • Evaluation: Given a mapping of nodes to processors, how can one compute the period and latency?

  • Optimization: Given a filtering workflow, how can one compute the mapping and schedule that minimize the period or latency? A solution to this problem requires generating both the mapping and the associated operation list—the order in which each processor executes its assigned tasks.

We address this general problem in two steps. First, we address the simplified model without communication cost. In this case, the evaluation problems are easy, and the optimization problems have polynomial complexity on homogeneous platforms. However, we show that the optimization problems become NP-hard on heterogeneous platforms. Second, we consider platforms with communication costs. Clearly, due to the previous results, the optimization problems on heterogeneous platforms are still NP-hard. Therefore we come back to homogeneous platforms and extend the framework with three significant realistic communication models. Now even evaluation problems become difficult, because the mapping must now be enriched with an operation list that provides the time-steps at which each computation and each communication occurs in the system: determining the best operation list has a combinatorial nature. Not too surprisingly, optimization problems are NP-hard too. Altogether, this paper provides a comprehensive overview of the additional difficulties induced by heterogeneity and communication costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agnetis, A., Detti, P., Pranzo, M., Sodhi, M.S.: Sequencing unreliable jobs on parallel machines. J. Sched. 12(1), 45–54 (2008). Available on-line at http://www.springerlink.com/content/c571u1221560j432

    Article  MathSciNet  Google Scholar 

  2. Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: SIGMOD’04: Proceedings of the 2004 ACM SIGMOD Int. Conf. on Management of Data, pp. 407–418. ACM, New York (2004)

    Chapter  Google Scholar 

  3. Benoit, A., Robert, Y.: Mapping pipeline skeletons onto heterogeneous platforms. J. Parallel Distrib. Comput. 68(6), 790–808 (2008)

    Article  Google Scholar 

  4. Benoit, A., Dufossé, F., Robert, Y.: Filter placement on a pipelined architecture. In: 11th Workshop on Advances in Parallel and Distributed Computational Models APDCM 2009. IEEE Computer Society, Los Alamitos (2009)

    Google Scholar 

  5. Bhat, P., Raghavendra, C., Prasanna, V.: Efficient collective communication in distributed heterogeneous systems. J. Parallel Distrib. Comput. 63, 251–263 (2003)

    Article  MATH  Google Scholar 

  6. Burge, J., Munagala, K., Srivastava, U.: Ordering pipelined query operators with precedence constraints. Research Report 2005-40, Stanford University, November 2005

  7. Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Trans. Database Syst. 24(2), 177–228 (1999)

    Article  Google Scholar 

  8. DataCutter Project: Middleware for Filtering Large Archival Scientific Datasets in a Grid Environment. http://www.cs.umd.edu/projects/hpsl/ResearchAreas/DataCutter.htm

  9. Florescu, D., Grunhagen, A., Kossmann, D.: Xl: A platform for web services. In: CIDR 2003, First Biennial Conference on Innovative Data Systems Research, 2003. On-line proceedings at http://www-db.cs.wisc.edu/cidr/program/p8.pdf

  10. Garey, M.R., Johnson, D.S.: Computers and Intractability, a Guide to the Theory of NP-Completeness. Freeman, New York (1979)

    MATH  Google Scholar 

  11. Hellerstein, J.M.: Predicate migration: optimizing queries with expensive predicates. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 267–276 (1993)

  12. Hong, B., Prasanna, V.: Bandwidth-aware resource allocation for heterogeneous computing systems to maximize throughput. In: Proceedings of the 32th International Conference on Parallel Processing, ICPP’2003. IEEE Computer Society, Los Alamitos (2003)

    Google Scholar 

  13. Ouzzani, M., Bouguettaya, A.: Query processing and optimization on the web. Distrib. Parallel Databases 15(3), 187–218 (2004)

    Article  Google Scholar 

  14. Snir, M., Otto, S.W., Huss-Lederman, S., Walker, D.W., Dongarra, J.: MPI the Complete Reference. MIT Press, Cambridge (1996)

    Google Scholar 

  15. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 355–366. VLDB Endowment (2006)

  16. Taura, K., Chien, A.A.: A heuristic algorithm for mapping communicating tasks on heterogeneous resources. In: Heterogeneous Computing Workshop, pp. 102–115. IEEE Computer Society, Los Alamitos (2000)

    Google Scholar 

  17. Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: Toward optimizing latency under throughput constraints for application workflows on clusters. In: Euro-Par’07. LNCS, vol. 4641, pp. 173–183. Springer, Berlin (2007)

    Google Scholar 

  18. Vydyanathan, N., Catalyurek, U., Kurc, T., Saddayappan, P., Saltz, J.: A duplication based algorithm for optimizing latency under throughput constraints for streaming workflows. In: ICPP’2008, the International Conference on Parallel Processing, pp. 254–261. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  19. Wu, Q., Gu, Y.: Supporting distributed application workflows in heterogeneous computing environments. In: 14th International Conference on Parallel and Distributed Systems, ICPADS. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  20. Wu, Q., Gao, J., Zhu, M., Rao, N., Huang, J., Iyengar, S.: On optimal resource utilization for distributed remote visualization. IEEE Trans. Comput. 57(1), 55–68 (2008)

    Article  MathSciNet  Google Scholar 

  21. Yu, W.: The two-machine flow shop problem with delays and the one-machine total tardiness problem. PhD Thesis, Technishe Universiteit Eidhoven, June 1996

  22. Yu, W., Hoogeveen, H., Lenstra, J.K.: Minimizing makespan in a two-machine flow shop with delays and unit-time operations is NP-hard. J. Sched. 7(5), 333–348 (2004)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fanny Dufossé.

Additional information

Part of this paper appeared in IPDPS’09 and SPAA’09.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Agrawal, K., Benoit, A., Dufossé, F. et al. Mapping Filtering Streaming Applications. Algorithmica 62, 258–308 (2012). https://doi.org/10.1007/s00453-010-9453-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00453-010-9453-6

Keywords

Navigation