Skip to main content
Log in

Top-k queries over web applications

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

The core logic of web applications that suggest some particular service, such as online shopping, e-commerce etc., is typically captured by Business Processes (BPs). Among all the (maybe infinitely many) possible execution flows of a BP, analysts are often interested in identifying flows that are “most important”, according to some weight metric. The goal of the present paper is to provide efficient algorithms for top-k query evaluation over the possible executions of Business Processes, under some given weight function. Unique difficulties in top-k analysis in this settings stem from (1) the fact that the number of possible execution flows of a given BP is typically very large, or even infinite in presence of recursion and (2) that the weights (e.g., likelihood, monetary cost, etc.) induced by actions performed during the execution (e.g., product purchase) may be inter-dependent (due to probabilistic dependencies, combined discount deals etc.). We exemplify these difficulties, and overcome them to provide efficient algorithms for query evaluation where possible. We also describe in details an application prototype that we have developed for recommending optimal navigation in an online shopping web site that is based on our model and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Certain EX-flows may have equal weights, which implies that there may be several valid solutions to the problem, in which case we pick one arbitrarily. Similarly, the BP specification may have fewer than \(k\) possible EX-flows, in which case we will simply return the set of all possible flows as an answer.

  2. A similar example can be designed for monotonically decreasing function and top-k flows.

  3. Recall that we are interested in finding the top-k finite, full EX-flows.

  4. Recall that we are currently assuming that \(cWeight\) is history-independent, and relax this assumption in Sect. 3.3.

  5. Note that there may be multiple EX-flows tied as best; in this case, we only claim that at least one of them satisfies the lemma.

  6. Again, there may be multiple such rankings due to ties, and we only claim existence of one satisfying the lemma.

  7. This equality is an equality between EX-flows, corresponding to a node-label, edge-relation and preserving isomorphism.

  8. This weight is uniquely defined, since \(cWeight\) has an history bound of \(b\).

  9. For an EX-flow \(e\), recall that \(\varPi ^{\prime }(e)\) is obtained from \(e\) by replacing each activity name \(a\) in \(e\) by \(\pi ^{\prime }(a)\).

  10. To our knowledge, there are no other published benchmarks for top-k algorithms in the context of eb Applications.

References

  1. Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in xml. In: Proceedings of EDBT (2006)

  2. Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of VLDB (2007)

  3. Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: index-access optimized top-k query processing. In: Proceedings of VLDB (2006)

  4. Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes. In: Proceedings of VLDB (2006)

  5. Benedikt, M., Kharlamov, E., Olteanu, D., Senellart, P.: Probabilistic xml via markov chains. PVLDB 3(1), 770–781 (2010)

    Google Scholar 

  6. Business Process Execution Language for Web Services. http://www.ibm.com/developerworks/library/ws-bpel/

  7. Cohen, S., Kimelfeld, B.: Querying parse trees of stochastic context-free grammars. In: ICDT, pp. 62–75 (2010)

  8. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)

  9. Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of a*. J, ACM 32(3) (1985)

  10. Deutch, D., Milo, T.: Type inference and type checking for queries on execution traces. In: Proceedings of VLDB (2008)

  11. Deutch, D., Milo, T.: Evaluating top-k queries over business processes. In: ICDE, pp. 1195–1198 (2009)

  12. Deutch, D., Milo, T.: Top-k projection queries for probabilistic business processes. In: Proceedings of ICDT (2009)

  13. Deutch, D., Milo, T., Polyzotis, N., Yam, T.: Optimal top-k query evaluation for weighted business processes. PVLDB 3(1), 940–951 (2010)

    Google Scholar 

  14. Deutch, D., Milo, T., Yam, T.: Goal-oriented web-site navigation for on-line shoppers (demonstration). PVLDB 2(2) (2009)

  15. Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X, Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)

  16. Ebay. http://www.ebay.com/

  17. Eppstein, D.: Finding the \(k\) shortest paths. In: Proceedings of 35th Symposium Foundations of Computer Science. IEEE, pp. 154–165 (1994)

  18. Etessami, K., Yannakakis, M.: Algorithmic verification of recursive probabilistic state machines. In: Proceedings of TACAS (2005)

  19. Etessami, K., Yannakakis, M.: Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. JACM 56(1) (2009)

  20. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top-k lists. In: Proceedings of SODA (2003)

  21. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003)

  22. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)

  23. Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)

  24. Jones, T.: Estimating Software Costs. McGraw-Hill, New York (2007)

    Google Scholar 

  25. Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Berlin (1976)

    MATH  Google Scholar 

  26. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: Proceedings of PODS, pp. 173–182 (2006)

  27. Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic xml. In: Proceedings of VLDB (2007)

  28. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL (2003)

  29. Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: Proceedings of VLDB (2003)

  30. Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Comput. Speech Lang. 4, 35–56 (1990)

    Article  Google Scholar 

  31. Nevsetvril, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism. Eur. J. Comb. 27(6) (2006)

  32. Oates, T., Doshi, S., Huang, F.: Estimating maximum likelihood parameters for stochastic context-free graph grammars. In: Proceedings of ILP (2003)

  33. Pirolli, P.L.T., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: empirical characterizations. World Wide Web 2(1–2) (1999)

  34. Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of ICDE (2007)

  35. Read, R.C., Tarjan, R.E.: Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks 5 (1975)

  36. Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)

  37. Yahoo! shopping. http://shopping.yahoo.com/

Download references

Acknowledgments

This work has been partially funded by the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant MoDaS, agreement 291071, by the Israel Ministry of Science, and by the US-Israel Binational Science foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Deutch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deutch, D., Milo, T. & Polyzotis, N. Top-k queries over web applications. The VLDB Journal 22, 519–542 (2013). https://doi.org/10.1007/s00778-012-0303-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-012-0303-9

Keywords

Navigation