Skip to main content

Cost-Based Vectorization of Instance-Based Integration Processes

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5739))

Abstract

The inefficiency of integration processes—as an abstraction of workflow-based integration tasks—is often reasoned by low resource utilization and significant waiting times for external systems. With the aim to overcome these problems, we proposed the concept of process vectorization. There, instance-based integration processes are transparently executed with the pipes-and-filters execution model. Here, the term vectorization is used in the sense of processing a sequence (vector) of messages by one standing process. Although it has been shown that process vectorization achieves a significant throughput improvement, this concept has two major drawbacks. First, the theoretical performance of a vectorized integration process mainly depends on the performance of the most cost-intensive operator. Second, the practical performance strongly depends on the number of available threads. In this paper, we present an advanced optimization approach that addresses the mentioned problems. Therefore, we generalize the vectorization problem and explain how to vectorize process plans in a cost-based manner. Due to the exponential complexity, we provide a heuristic computation approach and formally analyze its optimality. In conclusion of our evaluation, the message throughput can be significantly increased compared to both the instance-based execution as well as the rule-based process vectorization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Biornstad, B., Pautasso, C., Alonso, G.: Control the flow: How to safely compose streaming services into business processes. In: IEEE SCC (2006)

    Google Scholar 

  2. Boehm, M., Habich, D., Lehner, W., Wloka, U.: Vectorizing instance-based integration processes. In: ICEIS (2009), http://wwwdb.inf.tu-dresden.de/team/archives/2007/04/dipl_wirtinf_ma.php

  3. Boehm, M., Habich, D., Lehner, W., Wloka, U.: An advanced transaction model for recovery processing of integration processes. In: ADBIS (2008)

    Google Scholar 

  4. Boehm, M., Habich, D., Wloka, U., Bittner, J., Lehner, W.: Towards self-optimization of message transformation processes. In: ADBIS (2007)

    Google Scholar 

  5. Boehm, M., Habich, D., Lehner, W., Wloka, U.: Dipbench toolsuite: A framework for benchmarking integration systems. In: ICDE (2008)

    Google Scholar 

  6. Dalvi, N.N., Sanghai, S.K., Roy, P., Sudarshan, S.: Pipelining in multi-query optimization. In: PODS (2001)

    Google Scholar 

  7. Hasan, W., Motwani, R.: Optimization algorithms for exploiting the parallelism-communication tradeoff in pipelined parallelism. In: VLDB (1994)

    Google Scholar 

  8. Roy, P., Seshadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: SIGMOD (2000)

    Google Scholar 

  9. Wilschut, A.N., van Gils, S.A.: A model for pipelined query execution. In: MASCOTS (1993)

    Google Scholar 

  10. Johnson, R., Hardavellas, N., Pandis, I., Mancheril, N., Harizopoulos, S., Sabirli, K., Ailamaki, A., Falsafi, B.: To share or not to share? In: VLDB (2007)

    Google Scholar 

  11. Harizopoulos, S., Ailamaki, A.: A case for staged database systems. In: CIDR (2003)

    Google Scholar 

  12. Gao, K., Harizopoulos, S., Pandis, I., Shkapenyuk, V., Ailamaki, A.: Simultaneous pipelining in qpipe: Exploiting work sharing opportunities across queries. In: ICDE (2006)

    Google Scholar 

  13. Harizopoulos, S., Shkapenyuk, V., Ailamaki, A.: Qpipe: A simultaneously pipelined relational query engine. In: SIGMOD (2005)

    Google Scholar 

  14. Ives, Z.G., Florescu, D., Friedman, M., Levy, A.Y., Weld, D.S.: An adaptive query execution system for data integration. In: SIGMOD (1999)

    Google Scholar 

  15. Lee, R., Zhou, M., Liao, H.: Request window: an approach to improve throughput of rdbms-based data integration system. In: VLDB (2007)

    Google Scholar 

  16. Schmidt, S., Berthold, H., Lehner, W.: Qstream: Deterministic querying of data streams. In: VLDB (2004)

    Google Scholar 

  17. Boehm, A., Marth, E., Kanne, C.C.: The demaq system: declarative development of distributed applications. In: SIGMOD (2008)

    Google Scholar 

  18. Abadi, D.J., Ahmad, Y., Balazinska, M., Çetintemel, U., Cherniack, M., Hwang, J.H., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., Zdonik, S.B.: The design of the borealis stream processing engine. In: CIDR (2005)

    Google Scholar 

  19. Babcock, B., Babu, S., Datar, M., Motwani, R., Thomas, D.: Operator scheduling in data stream systems. VLDB J. 13(4) (2004)

    Google Scholar 

  20. Carney, D., Çetintemel, U., Rasin, A., Zdonik, S.B., Cherniack, M., Stonebraker, M.: Operator scheduling in a data stream manager. In: VLDB (2003)

    Google Scholar 

  21. Koch, C., Scherzinger, S., Schweikardt, N., Stegmaier, B.: Schema-based scheduling of event processors and buffer minimization for queries on structured data streams. In: VLDB (2004)

    Google Scholar 

  22. Schmidt, S., Legler, T., Schaller, D., Lehner, W.: Real-time scheduling for data stream management systems. In: ECRTS (2005)

    Google Scholar 

  23. Cammert, M., Heinz, C., Krämer, J., Seeger, B., Vaupel, S., Wolske, U.: Flexible multi-threaded scheduling for continuous queries over data streams. In: ICDE Workshops (2007)

    Google Scholar 

  24. Srivastava, U., Munagala, K., Widom, J., Motwani, R.: Query optimization over web services. In: VLDB (2006)

    Google Scholar 

  25. Gounaris, A., Yfoulis, C., Sakellariou, R., Dikaiakos, M.D.: Robust runtime optimization of data transfer in queries over web services. In: ICDE (2008)

    Google Scholar 

  26. Lemos, M., Casanova, M.A., Furtado, A.L.: Process pipeline scheduling. J. Syst. Softw. 81(3) (2008)

    Google Scholar 

  27. Simitsis, A., Vassiliadis, P., Sellis, T.: Optimizing etl processes in data warehouses. In: ICDE (2005)

    Google Scholar 

  28. Hull, R., Llirbat, F., Kumar, B., Zhou, G., Dong, G., Su, J.: Optimization techniques for data-intensive decision flows. In: ICDE (2000)

    Google Scholar 

  29. Li, H., Zhan, D.: Workflow timed critical path optimization. Nature and Science 3(2) (2005)

    Google Scholar 

  30. Vrhovnik, M., Schwarz, H., Suhre, O., Mitschang, B., Markl, V., Maier, A., Kraft, T.: An approach to optimize data processing in business processes. In: VLDB (2007)

    Google Scholar 

  31. Boehm, M., Habich, D., Lehner, W., Wloka, U.: Workload-based optimization of integration processes. In: CIKM (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boehm, M., Habich, D., Preissler, S., Lehner, W., Wloka, U. (2009). Cost-Based Vectorization of Instance-Based Integration Processes. In: Grundspenkis, J., Morzy, T., Vossen, G. (eds) Advances in Databases and Information Systems. ADBIS 2009. Lecture Notes in Computer Science, vol 5739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03973-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03973-7_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03972-0

  • Online ISBN: 978-3-642-03973-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics