Skip to main content
Log in

Personalized ranking in web databases: establishing and utilizing an appropriate workload

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The emergence of the deep Web has given a new connotation to the concept of ranking database query results. Earlier approaches for ranking either resorted to analyzing frequencies of database values and query logs or establishing user profiles. In contrast, an integrated approach, based on the notion of a similarity model, for holistically supporting user- and query-dependent ranking has been recently proposed (Telang et al. in IEEE Transactions on Knowledge and Data Engineering (TKDE), 2011). An important component of this framework is a workload consisting of ranking functions, wherein each function represents an individual user’s preferences towards the results of a specific query. At the time of answering a query for which no prior ranking function exists, the similarity model is employed, and is expected to ensure a good quality of ranking as long as a ranking function for a very similar user-query pair exists in this workload.

In this paper, we address the problem of determining an appropriate set of user-query pairs to form a workload of ranking functions to support user- and query-dependent ranking for Web databases. We propose a novel metric, termed workload goodness, that quantifies the notion of a “good” workload into an absolute value. The process of finding such a workload of optimal goodness is a combinatorially explosive problem; therefore, we propose a heuristic solution, and advance three approaches for determining an acceptable workload, in a static as well as a dynamic environment. We discuss the effectiveness of our proposal analytically as well as experimentally over two Web databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The concept of workload here is significantly different from the one in traditional databases. In the Former’s case, the workload is a collection of ranking functions along with the user-query pairs for whom the functions are derived; in contrast, it pertains to a log of queries in the latter’s context.

  2. A ranking function is obtained via a learning model, proposed in [30], that analyzes a user’s preferences towards the results of a query.

  3. The functional details of the similarity-based ranking framework are elaborated in Sect. 2.

  4. Given that we focus on establishing only W K , we use the term workload and W K interchangeably for the rest of the paper.

  5. Typically, the number of users and queries on most real Web databases like Yahoo! Autos, Google Base, etc. are extremely large, whereas the value of K is typically much smaller.

  6. The value ‘any’ will match all possible values for the domain of the particular attribute. For example, a value of ‘any’ for the Transmission attribute in a Vehicle database retrieves cars with ‘manual’ as well as ‘auto’ transmission.

  7. Without loss of generality, we assume {Q 1,Q 2,…,Q r } are the common queries for U and U′, although they can be any queries.

  8. As elaborated in Sect. 2, since the highest rank of 0 is assigned to the pair itself, the next highest possible rank, computed by (1), of a user-query pair with respect to a given pair is 1.

References

  1. Agrawal, R., Rantzau, R., Terzi, E.: Context-sensitive ranking. In: SIGMOD Conference, pp. 383–394. ACM, New York (2006)

    Google Scholar 

  2. Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: Conference on Innovations in Database Research (CIDR) (2003)

    Google Scholar 

  3. Balabanovic, M., Shoham, Y.: Content-based collaborative recommendation. ACM Commun. 40(3), 66–72 (1997)

    Article  Google Scholar 

  4. Basilico, J., Hofmann, T.: A joint framework for collaborative and content filtering. In: SIGIR, pp. 550–551 (2004)

    Chapter  Google Scholar 

  5. Basu, C., Hirsh, H., Cohen, W.W.: Recommendation as classification: using social and content-based information in recommendation. In: AAAI/IAAI, pp. 714–720 (1998)

    Google Scholar 

  6. Bergman, M.K.: The deep web: surfacing hidden value. J. Electron. Publ. 7(1) (2001)

  7. Billsus, D., Pazzani, M.J.: Learning collaborative information filters. In: International Conference on Machine Learning (ICML), pp. 46–54 (1998)

    Google Scholar 

  8. Blum, M., Floyd, R.W., Pratt, V., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  9. Chang, K.C.-C., He, B., Li, C., Patil, M., Zhang, Z.: Structured databases on the web: observations and implications. SIGMOD Rec. 33(3), 61–70 (2004)

    Article  Google Scholar 

  10. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)

    Chapter  Google Scholar 

  11. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. TODS 31(3), 1134–1168 (2006)

    Article  Google Scholar 

  12. Foltz, P.W., Dumais, S.T.: Personalized information delivery: an analysis of information filtering methods. ACM Commun. 35(12), 51–60 (1992)

    Article  Google Scholar 

  13. Gauch, S., Speretta, M., Chandramouli, A., Micarelli, A.: User profiles for personalized information access. In: The Adaptive Web, pp. 54–89 (2007)

    Chapter  Google Scholar 

  14. Google. Google base. http://www.google.com/base

  15. Hofmann, T.: Collaborative filtering via gaussian probabilistic latent semantic analysis. In: SIGIR, pp. 259–266 (2003)

    Google Scholar 

  16. Hwang, S.-W.: Supporting ranking for data retrieval. Ph.D. thesis, University of Illinois, Urbana Champaign (2005)

  17. Ilyas, I.F., Soliman, M.A.: Probabilistic Ranking Techniques in Relational Databases. Synthesis Lectures on Data Management (2011). Morgan & Claypool Publishers

    MATH  Google Scholar 

  18. Kanungo, T., Mount, D.: An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)

    Article  Google Scholar 

  19. Koutrika, G.: Database query personalization. In: EDBT, pp. 147–152 (2005)

    Google Scholar 

  20. Koutrika, G., Ioannidis, Y.E.: Personalization of queries in database systems. In: ICDE, pp. 597–608 (2004)

    Google Scholar 

  21. Koutrika, G., Ioannidis, Y.E.: Constrained optimalities in query personalization. In: SIGMOD Conference, pp. 73–84 (2005)

    Google Scholar 

  22. Li, C., Chang, K.C.-C., Ilyas, I.F., Song, S.: Ranksql: query algebra and optimization for relational top-k queries. In: SIGMOD Conference, pp. 131–142 (2005)

    Google Scholar 

  23. Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)

    Article  Google Scholar 

  24. Ortega-Binderberger, M., Chakrabarti, K., Mehrotra, S.: An approach to integrating query refinement in sql. In: EDBT, pp. 15–33 (2002)

    Google Scholar 

  25. Schein, A.I., Popescul, A., Ungar, L.H., Pennock, D.M.: Methods and metrics for cold-start recommendations. In: SIGIR, pp. 253–260 (2002)

    Google Scholar 

  26. Soliman, M.A., Ilyas, I.F., Ben-David, S.: Supporting ranking queries on uncertain and incomplete data. VLDB J. 19(4), 477–501 (2010)

    Article  Google Scholar 

  27. Soliman, M.A., Ilyas, I.F., Martinenghi, D., Tagliasacchi, M.: Ranking with uncertain scoring functions: semantics and sensitivity measures. In: SIGMOD Conference, pp. 805–816 (2011)

    Google Scholar 

  28. Su, W., Wang, J., Huang, Q., Lochovsky, F.: Query result ranking over e-commerce web databases. In: Conference on Information and Knowledge Management (CIKM), pp. 575–584 (2006)

    Google Scholar 

  29. Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: towards user- and query-dependent ranking for web databases. Technical report 6, University of Texas at Arlington (2009)

  30. Telang, A., Li, C., Chakravarthy, S.: One size does not fit all: towards user- and query-dependent ranking for web databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) (2011)

  31. Werner, K.: Foundations of preferences in database systems. In: VLDB. VLDB Endowment, pp. 311–322 (2002)

    Google Scholar 

  32. Yu, H., Hwang, S.-w., Chang, K.C.-C.: Enabling soft queries for data retrieval. Inf. Syst. 32(4), 560–574 (2007)

    Article  Google Scholar 

  33. Yu, H., Kim, Y., won Hwang, S.: Rv-svm: an efficient method for learning ranking svm. In: PAKDD, pp. 426–438 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditya Telang.

Additional information

Communicated by: Kaushik Chakrabarti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Telang, A., Chakravarthy, S. & Li, C. Personalized ranking in web databases: establishing and utilizing an appropriate workload. Distrib Parallel Databases 31, 47–70 (2013). https://doi.org/10.1007/s10619-012-7106-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-012-7106-2

Keywords

Navigation