Propagation of Densities of Streaming Data within Query Graphs

  • Michael Daum
  • Frank Lauterwald
  • Philipp Baumgärtel
  • Klaus Meyer-Wegener
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6187)


Data Stream SystemsDSS use cost models to determine if a DSS can cope with a given workload and to optimize query graphs. However, certain relevant input parameters of these models are often unknown or highly imprecise. Especially selectivities are stream-dependent and application-specific parameters.

In this paper, we describe a method that supports selectivity estimation considering input streams’ attribute value distribution. The novelty of our approach is the propagation of the probability distributions through the query graph in order to give estimates for the inner nodes of the graph. For most common stream operators, we establish formulas that describe their output distribution as a function of their input distributions. For unknown operators like User-Defined OperatorsUDO, we introduce a method to measure the influence of these operators on arbitrary probability distributions. This method is able to do most of the computational work before the query is deployed and introduces minimal overhead at runtime. Our evaluation framework facilitates the appropriate combination of both methods and allows to model almost arbitrary query graphs.


Mean Square Error Data Stream Input Stream Streaming Data Kernel Density Estimator 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Daum, M., Fischer, M., Kiefer, M., Meyer-Wegener, K.: Integration of Heterogeneous Sensor Nodes by Data Stream Management. In: Proceedings of the 10th International Conference on Mobile Data Management: Systems, Services and Middleware (MDM), pp. 525–530. IEEE Computer Society, Los Alamitos (2009)CrossRefGoogle Scholar
  2. 2.
    Heinz, C., Seeger, B.: Towards Kernel Density Estimation over Streaming Data. In: Proceedings of the 13th International Conference on Management of Data (COMAD), Delhi, India (2006)Google Scholar
  3. 3.
    Heinz, C., Seeger, B.: Adaptive Wavelet Density Estimators over Data Streams. In: Proceedings of the 19th International Conference on Scientific and Statistical Database Management (SSDBM), p. 35. IEEE Computer Society, Washington (2007)CrossRefGoogle Scholar
  4. 4.
    Merrett, T.H., Otoo, E.J.: Distribution Models of Relations. In: Proceedings of the 5th International Conference on Very Large Data Bases (VLDB), VLDB Endowment, pp. 418–425 (1979)Google Scholar
  5. 5.
    Muthuswamy, B., Kerschberg, L.: A Detailed Statistical Model for Relational Query Optimization. In: Proceedings of the 13th ACM Annual Conference, The range of computing: mid-80’s perspective, pp. 439–448. ACM, New York (1985)CrossRefGoogle Scholar
  6. 6.
    Mannino, M.V., Chu, P., Sager, T.: Statistical profile estimation in database systems. ACM Computing Surveys (CSUR) 20(3), 191–221 (1988)zbMATHCrossRefGoogle Scholar
  7. 7.
    Heinz, C., Kramer, J., Riemenschneider, T., Seeger, B.: Toward Simulation-Based Optimization in Data Stream Management Systems. In: Proceedings of the IEEE International Conference on Data Engineering, ICDE (2008)Google Scholar
  8. 8.
    Blohsfeld, B., Heinz, C., Seeger, B.: Maintaining nonparametric estimators over data streams. In: Proceedings of the GI-Fachtagung für Datenbanksysteme in Business, Technologie und Web, BTW (2005)Google Scholar
  9. 9.
    Gunopulos, D., Kollios, G., Tsotras, J., Domeniconi, C.: Selectivity estimators for multidimensional range queries over real attributes. The International Journal on Very Large Data Bases (VLDBJ) 14(2), 137–154 (2005)CrossRefGoogle Scholar
  10. 10.
    Viglas, S.D., Naughton, J.F.: Rate-Based Query Optimization for Streaming Information Sources. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 37–48. ACM Press, New York (2002)CrossRefGoogle Scholar
  11. 11.
    Meyerhöfer, M.: Messung und Verwaltung von Softwarekomponenten für die Performancevorhersage. PhD thesis, University of Erlangen-Nuremberg (2007)Google Scholar
  12. 12.
    Hamlet, D., Mason, D., Woit, D.: Properties of Software Systems Synthesized from Components. In: Component-Based Software Development: Case Studies, pp. 129–159. World Scientific Publishing Company, Singapore (2004)Google Scholar
  13. 13.
    Heinz, C.: Density Estimation over Data Streams. PhD thesis, University of Marburg (2007)Google Scholar
  14. 14.
    Silverman, B.: Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. Chapman and Hall, London (1986)Google Scholar
  15. 15.
    Scott, D.W.: Multivariate Density Estimation. Wiley Interscience, Hoboken (1992)zbMATHCrossRefGoogle Scholar
  16. 16.
    Abadi, D.J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. The International Journal on Very Large Data Bases (VLDBJ) 12(2), 120–139 (2003)CrossRefGoogle Scholar
  17. 17.
    Zhou, A., Cai, Z., Wei, L., Qian, W.: M-Kernel Merging: Towards Density Estimation over Data Streams. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA), pp. 285–292. IEEE Computer Society, Washington (2003)Google Scholar
  18. 18.
    Arasu, A., Babu, S., Widom, J.: The CQL continuous query language: semantic foundations and query execution. The International Journal on Very Large Data Bases (VLDBJ) 15(2), 121–142 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Michael Daum
    • 1
  • Frank Lauterwald
    • 1
  • Philipp Baumgärtel
    • 1
  • Klaus Meyer-Wegener
    • 1
  1. 1.Dept. of Computer ScienceUniversity of Erlangen-NurembergGermany

Personalised recommendations