Abstract
The core logic of web applications that suggest some particular service, such as online shopping, e-commerce etc., is typically captured by Business Processes (BPs). Among all the (maybe infinitely many) possible execution flows of a BP, analysts are often interested in identifying flows that are “most important”, according to some weight metric. The goal of the present paper is to provide efficient algorithms for top-k query evaluation over the possible executions of Business Processes, under some given weight function. Unique difficulties in top-k analysis in this settings stem from (1) the fact that the number of possible execution flows of a given BP is typically very large, or even infinite in presence of recursion and (2) that the weights (e.g., likelihood, monetary cost, etc.) induced by actions performed during the execution (e.g., product purchase) may be inter-dependent (due to probabilistic dependencies, combined discount deals etc.). We exemplify these difficulties, and overcome them to provide efficient algorithms for query evaluation where possible. We also describe in details an application prototype that we have developed for recommending optimal navigation in an online shopping web site that is based on our model and algorithms.
Similar content being viewed by others
Notes
Certain EX-flows may have equal weights, which implies that there may be several valid solutions to the problem, in which case we pick one arbitrarily. Similarly, the BP specification may have fewer than \(k\) possible EX-flows, in which case we will simply return the set of all possible flows as an answer.
A similar example can be designed for monotonically decreasing function and top-k flows.
Recall that we are interested in finding the top-k finite, full EX-flows.
Recall that we are currently assuming that \(cWeight\) is history-independent, and relax this assumption in Sect. 3.3.
Note that there may be multiple EX-flows tied as best; in this case, we only claim that at least one of them satisfies the lemma.
Again, there may be multiple such rankings due to ties, and we only claim existence of one satisfying the lemma.
This equality is an equality between EX-flows, corresponding to a node-label, edge-relation and preserving isomorphism.
This weight is uniquely defined, since \(cWeight\) has an history bound of \(b\).
For an EX-flow \(e\), recall that \(\varPi ^{\prime }(e)\) is obtained from \(e\) by replacing each activity name \(a\) in \(e\) by \(\pi ^{\prime }(a)\).
To our knowledge, there are no other published benchmarks for top-k algorithms in the context of eb Applications.
References
Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in xml. In: Proceedings of EDBT (2006)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of VLDB (2007)
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: index-access optimized top-k query processing. In: Proceedings of VLDB (2006)
Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes. In: Proceedings of VLDB (2006)
Benedikt, M., Kharlamov, E., Olteanu, D., Senellart, P.: Probabilistic xml via markov chains. PVLDB 3(1), 770–781 (2010)
Business Process Execution Language for Web Services. http://www.ibm.com/developerworks/library/ws-bpel/
Cohen, S., Kimelfeld, B.: Querying parse trees of stochastic context-free grammars. In: ICDT, pp. 62–75 (2010)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)
Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of a*. J, ACM 32(3) (1985)
Deutch, D., Milo, T.: Type inference and type checking for queries on execution traces. In: Proceedings of VLDB (2008)
Deutch, D., Milo, T.: Evaluating top-k queries over business processes. In: ICDE, pp. 1195–1198 (2009)
Deutch, D., Milo, T.: Top-k projection queries for probabilistic business processes. In: Proceedings of ICDT (2009)
Deutch, D., Milo, T., Polyzotis, N., Yam, T.: Optimal top-k query evaluation for weighted business processes. PVLDB 3(1), 940–951 (2010)
Deutch, D., Milo, T., Yam, T.: Goal-oriented web-site navigation for on-line shoppers (demonstration). PVLDB 2(2) (2009)
Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X, Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)
Ebay. http://www.ebay.com/
Eppstein, D.: Finding the \(k\) shortest paths. In: Proceedings of 35th Symposium Foundations of Computer Science. IEEE, pp. 154–165 (1994)
Etessami, K., Yannakakis, M.: Algorithmic verification of recursive probabilistic state machines. In: Proceedings of TACAS (2005)
Etessami, K., Yannakakis, M.: Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. JACM 56(1) (2009)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top-k lists. In: Proceedings of SODA (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003)
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Jones, T.: Estimating Software Costs. McGraw-Hill, New York (2007)
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Berlin (1976)
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: Proceedings of PODS, pp. 173–182 (2006)
Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic xml. In: Proceedings of VLDB (2007)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL (2003)
Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: Proceedings of VLDB (2003)
Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Comput. Speech Lang. 4, 35–56 (1990)
Nevsetvril, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism. Eur. J. Comb. 27(6) (2006)
Oates, T., Doshi, S., Huang, F.: Estimating maximum likelihood parameters for stochastic context-free graph grammars. In: Proceedings of ILP (2003)
Pirolli, P.L.T., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: empirical characterizations. World Wide Web 2(1–2) (1999)
Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of ICDE (2007)
Read, R.C., Tarjan, R.E.: Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks 5 (1975)
Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)
Yahoo! shopping. http://shopping.yahoo.com/
Acknowledgments
This work has been partially funded by the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant MoDaS, agreement 291071, by the Israel Ministry of Science, and by the US-Israel Binational Science foundation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Deutch, D., Milo, T. & Polyzotis, N. Top-k queries over web applications. The VLDB Journal 22, 519–542 (2013). https://doi.org/10.1007/s00778-012-0303-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-012-0303-9