Top-k queries over web applications

Deutch, Daniel; Milo, Tova; Polyzotis, Neoklis

doi:10.1007/s00778-012-0303-9

Top-k queries over web applications

Regular Paper
Published: 28 December 2012

Volume 22, pages 519–542, (2013)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Daniel Deutch¹,
Tova Milo² &
Neoklis Polyzotis³

1639 Accesses
3 Citations
Explore all metrics

Abstract

The core logic of web applications that suggest some particular service, such as online shopping, e-commerce etc., is typically captured by Business Processes (BPs). Among all the (maybe infinitely many) possible execution flows of a BP, analysts are often interested in identifying flows that are “most important”, according to some weight metric. The goal of the present paper is to provide efficient algorithms for top-k query evaluation over the possible executions of Business Processes, under some given weight function. Unique difficulties in top-k analysis in this settings stem from (1) the fact that the number of possible execution flows of a given BP is typically very large, or even infinite in presence of recursion and (2) that the weights (e.g., likelihood, monetary cost, etc.) induced by actions performed during the execution (e.g., product purchase) may be inter-dependent (due to probabilistic dependencies, combined discount deals etc.). We exemplify these difficulties, and overcome them to provide efficient algorithms for query evaluation where possible. We also describe in details an application prototype that we have developed for recommending optimal navigation in an online shopping web site that is based on our model and algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k Queries Over Uncertain Scores

Generic Top-k Query Processing with Breadth-First Strategies

Evaluation of IR Applications with Constrained Real Estate

Notes

Certain EX-flows may have equal weights, which implies that there may be several valid solutions to the problem, in which case we pick one arbitrarily. Similarly, the BP specification may have fewer than \(k\) possible EX-flows, in which case we will simply return the set of all possible flows as an answer.
A similar example can be designed for monotonically decreasing function and top-k flows.
Recall that we are interested in finding the top-k finite, full EX-flows.
Recall that we are currently assuming that \(cWeight\) is history-independent, and relax this assumption in Sect. 3.3.
Note that there may be multiple EX-flows tied as best; in this case, we only claim that at least one of them satisfies the lemma.
Again, there may be multiple such rankings due to ties, and we only claim existence of one satisfying the lemma.
This equality is an equality between EX-flows, corresponding to a node-label, edge-relation and preserving isomorphism.
This weight is uniquely defined, since \(cWeight\) has an history bound of \(b\).
For an EX-flow \(e\), recall that \(\varPi ^{\prime }(e)\) is obtained from \(e\) by replacing each activity name \(a\) in \(e\) by \(\pi ^{\prime }(a)\).
To our knowledge, there are no other published benchmarks for top-k algorithms in the context of eb Applications.

References

Abiteboul, S., Senellart, P.: Querying and updating probabilistic information in xml. In: Proceedings of EDBT (2006)
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: Proceedings of VLDB (2007)
Bast, H., Majumdar, D., Schenkel, R., Theobald, M., Weikum, G.: Io-top-k: index-access optimized top-k query processing. In: Proceedings of VLDB (2006)
Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes. In: Proceedings of VLDB (2006)
Benedikt, M., Kharlamov, E., Olteanu, D., Senellart, P.: Probabilistic xml via markov chains. PVLDB 3(1), 770–781 (2010)
Google Scholar
Business Process Execution Language for Web Services. http://www.ibm.com/developerworks/library/ws-bpel/
Cohen, S., Kimelfeld, B.: Querying parse trees of stochastic context-free grammars. In: ICDT, pp. 62–75 (2010)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. In: Proceedings of VLDB (2004)
Dechter, R., Pearl, J.: Generalized best-first search strategies and the optimality of a*. J, ACM 32(3) (1985)
Deutch, D., Milo, T.: Type inference and type checking for queries on execution traces. In: Proceedings of VLDB (2008)
Deutch, D., Milo, T.: Evaluating top-k queries over business processes. In: ICDE, pp. 1195–1198 (2009)
Deutch, D., Milo, T.: Top-k projection queries for probabilistic business processes. In: Proceedings of ICDT (2009)
Deutch, D., Milo, T., Polyzotis, N., Yam, T.: Optimal top-k query evaluation for weighted business processes. PVLDB 3(1), 940–951 (2010)
Google Scholar
Deutch, D., Milo, T., Yam, T.: Goal-oriented web-site navigation for on-line shoppers (demonstration). PVLDB 2(2) (2009)
Ding, B., Xu Yu, J., Wang, S., Qin, L., Zhang, X, Lin, X.: Finding top-k min-cost connected trees in databases. In: ICDE, pp. 836–845 (2007)
Ebay. http://www.ebay.com/
Eppstein, D.: Finding the \(k\) shortest paths. In: Proceedings of 35th Symposium Foundations of Computer Science. IEEE, pp. 154–165 (1994)
Etessami, K., Yannakakis, M.: Algorithmic verification of recursive probabilistic state machines. In: Proceedings of TACAS (2005)
Etessami, K., Yannakakis, M.: Recursive markov chains, stochastic grammars, and monotone systems of nonlinear equations. JACM 56(1) (2009)
Fagin, R., Kumar, R., Sivakumar, D.: Comparing top-k lists. In: Proceedings of SODA (2003)
Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4) (2003)
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of IJCAI (1999)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4) (2008)
Jones, T.: Estimating Software Costs. McGraw-Hill, New York (2007)
Google Scholar
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Berlin (1976)
MATH Google Scholar
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: Proceedings of PODS, pp. 173–182 (2006)
Kimelfeld, B., Sagiv, Y.: Matching twigs in probabilistic xml. In: Proceedings of VLDB (2007)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: ACL (2003)
Koudas, N., Srivastava, D.: Data stream query processing: a tutorial. In: Proceedings of VLDB (2003)
Lary, K., Young, S.J.: The estimation of stochastic context-free grammars using the inside-outside algrithm. Comput. Speech Lang. 4, 35–56 (1990)
Article Google Scholar
Nevsetvril, J., de Mendez, P.O.: Tree-depth, subgraph coloring and homomorphism. Eur. J. Comb. 27(6) (2006)
Oates, T., Doshi, S., Huang, F.: Estimating maximum likelihood parameters for stochastic context-free graph grammars. In: Proceedings of ILP (2003)
Pirolli, P.L.T., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: empirical characterizations. World Wide Web 2(1–2) (1999)
Re, C., Dalvi, N.N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: Proceedings of ICDE (2007)
Read, R.C., Tarjan, R.E.: Bounds on backtrack algorithms for listing cycles, paths, and spanning trees. Networks 5 (1975)
Sanghai, S., Domingos, P., Weld, D.: Dynamic probabilistic relational models. In: Proceedings of IJCAI (2003)
Yahoo! shopping. http://shopping.yahoo.com/

Download references

Acknowledgments

This work has been partially funded by the European Research Council under the European Community’s Seventh Framework Programme (FP7/2007-2013) / ERC grant MoDaS, agreement 291071, by the Israel Ministry of Science, and by the US-Israel Binational Science foundation.

Author information

Authors and Affiliations

Ben Gurion University, Beer-Sheva, Israel
Daniel Deutch
Tel Aviv University, Tel Aviv, Israel
Tova Milo
UC Santa Cruz, Santa Cruz, CA, USA
Neoklis Polyzotis

Authors

Daniel Deutch
View author publications
You can also search for this author in PubMed Google Scholar
Tova Milo
View author publications
You can also search for this author in PubMed Google Scholar
Neoklis Polyzotis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Deutch.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deutch, D., Milo, T. & Polyzotis, N. Top-k queries over web applications. The VLDB Journal 22, 519–542 (2013). https://doi.org/10.1007/s00778-012-0303-9

Download citation

Received: 26 August 2011
Revised: 25 August 2012
Accepted: 26 November 2012
Published: 28 December 2012
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00778-012-0303-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Top-k queries over web applications

Abstract

Access this article

Similar content being viewed by others

Top-k Queries Over Uncertain Scores

Generic Top-k Query Processing with Breadth-First Strategies

Evaluation of IR Applications with Constrained Real Estate

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Top-k queries over web applications

Abstract

Access this article

Similar content being viewed by others

Top-k Queries Over Uncertain Scores

Generic Top-k Query Processing with Breadth-First Strategies

Evaluation of IR Applications with Constrained Real Estate

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation