Advertisement

FP-Rank: An Effective Ranking Approach Based on Frequent Pattern Analysis

  • Yuanfeng Song
  • Kenneth Leung
  • Qiong Fang
  • Wilfred Ng
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7826)

Abstract

Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as document retrieval and recommendation systems. Extensive studies in this area have focused on developing efficient ranking models. While ranking models are usually trained based on given training datasets, besides model training algorithms, the quality of the document features selected for model training also plays a very important aspect on the model performance. The main objective of this paper is to present an approach to discover “significant” document features for learning to rank (LTR) problem. We conduct a systematic exploration of frequent pattern-based ranking. First, we formally analyze the effectiveness of frequent patterns for ranking. Combined features, which constitute a large portion of frequent patterns, perform better than single features in terms of capturing rich underlying semantics of the documents and hence provide good feature candidates for ranking. Based on our analysis, we propose a new ranking approach called FP-Rank. Essentially, FP-Rank adopts frequent pattern mining algorithms to mine frequent patterns, and then a new pattern selection algorithm is adopted to select a set of patterns with high overall significance and low redundancy. Our experiments on the real datasets confirm that, by incorporating effective frequent patterns to train a ranking model, such as RankSVM, the performance of the ranking model can be substantially improved.

Keywords

Learning to rank frequent pattern combined features feature selection ranking performance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB 1994, pp. 487–499 (1994)Google Scholar
  2. 2.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison Wesley (1999)Google Scholar
  3. 3.
    Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: ICML 2005, pp. 89–96 (2005)Google Scholar
  4. 4.
    Cao, Y., Xu, J., Liu, T.-Y., Li, H., Huang, Y., Hon, H.-W.: Adapting ranking svm to document retrieval. In: SIGIR 2006, pp. 186–193 (2006)Google Scholar
  5. 5.
    Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., Li, H.: Learning to rank: from pairwise approach to listwise approach. In: ICML 2007, pp. 129–136 (2007)Google Scholar
  6. 6.
    Cheng, H., Yan, X., Han, J., Hsu, C.-W.: Discriminative frequent pattern analysis for effective classification. In: ICDE 2007, pp. 169–178 (2007)Google Scholar
  7. 7.
    Cheng, H., Yan, X., Han, J., Yu, P.S.: Direct discriminative pattern mining for effective classification. In: ICDE 2008, pp. 169–178 (2008)Google Scholar
  8. 8.
    Burges, C.J., Ragno, R., Le, Q.V.: Learning to rank with nonsmooth cost functions. In: NIPS 2006, pp. 193–200 (2006)Google Scholar
  9. 9.
    Cossock, D., Zhang, T.: Subset ranking using regression. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS (LNAI), vol. 4005, pp. 605–619. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Fayyad, I.: Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI 1993, pp. 1022–1027 (1993)Google Scholar
  11. 11.
    Freund, Y., Iyer, R., Schapire, R.E., Singer, Y.: An efficient boosting algorithm for combining preferences. J. Mach. Learn. Res. 4, 933–969 (2003)MathSciNetGoogle Scholar
  12. 12.
    Geng, X., Liu, T.-Y., Qin, T., Li, H.: Feature selection for ranking. In: SIGIR 2007, pp. 407–414 (2007)Google Scholar
  13. 13.
    Grahne, G., Zhu, J.: Efficiently using prefix-trees in mining frequent itemsets. In: FIMI 2003 (2003)Google Scholar
  14. 14.
    Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1), 55–86 (2007)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Joachims, T.: Training linear svms in linear time. In: KDD 2006, pp. 217–226 (2006)Google Scholar
  16. 16.
    Karimzadehgan, M., Li, W., Zhang, R., Mao, J.: A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW 2011, pp. 377–386 (2011)Google Scholar
  17. 17.
    Li, P., Burges, C.J.C., Wu, Q.: Mcrank: Learning to rank using multiple classification and gradient boosting. In: NIPS 2007, pp. 845–852 (2007)Google Scholar
  18. 18.
    Li, W., Han, J., Pei, J.: Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM 2001, p. 369 (2001)Google Scholar
  19. 19.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: KDD 1998, pp. 80–86 (1998)Google Scholar
  20. 20.
    Nallapati, R.: Discriminative models for information retrieval. In: SIGIR 2004, pp. 64–71 (2004)Google Scholar
  21. 21.
    Qin, T.: yan Liu, T., feng Tsai, M., dong Zhang, X., Li, H.: Learning to search web pages with query-level loss functions. Tech. rep. (2006)Google Scholar
  22. 22.
    Qin, T., Liu, T.-Y., Xu, J., Li, H.: Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval 13, 346–374 (2010)CrossRefGoogle Scholar
  23. 23.
    Qin, T., Zhang, X.-D., Wang, D.-S., Liu, T.-Y., Lai, W., Li, H.: Ranking with multiple hyperplanes. In: SIGIR 2007, pp. 279–286 (2007)Google Scholar
  24. 24.
    Sculley, D.: Combined regression and ranking. In: KDD 2010, pp. 979–988. ACM, New York (2010)Google Scholar
  25. 25.
    Taylor, M., Guiver, J., Robertson, S., Minka, T.: Softrank: optimizing non-smooth rank metrics. In: WSDM 2008, pp. 77–86 (2008)Google Scholar
  26. 26.
    Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. In: PVLDB 2012, vol. 5(11), pp. 1650–1661 (2012)Google Scholar
  27. 27.
    Tong, Y., Chen, L., Ding, B.: Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE 2012, pp. 270–281 (2012)Google Scholar
  28. 28.
    Tsai, M.-F., Liu, T.-Y., Qin, T., Chen, H.-H., Ma, W.-Y.: Frank: a ranking method with fidelity loss. In: SIGIR 2007, pp. 383–390 (2007)Google Scholar
  29. 29.
    Valizadegan, H., Jin, R., Zhang, R., Mao, J.: Learning to rank by optimizing ndcg measure. In: NIPS 2009 (2009)Google Scholar
  30. 30.
    Veloso, A.A., Almeida, H.M., Gonçalves, M.A., Meira Jr., W.: Learning to rank at query-time using association rules. In: SIGIR 2008, pp. 267–274 (2008)Google Scholar
  31. 31.
    Volkovs, M.N., Zemel, R.S.: Boltzrank: learning to maximize expected ranking gain. In: ICML 2009, pp. 1089–1096 (2009)Google Scholar
  32. 32.
    Wang, J., Karypis, G.: On mining instance-centric classification rules. IEEE Trans. on Knowl. and Data Eng. 18, 1497–1511 (2006)CrossRefGoogle Scholar
  33. 33.
    Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398 (2007)Google Scholar
  34. 34.
    Yin, X., Han, J.: Cpar: Classification based on predictive association rules. In: SDM 2003 (2003)Google Scholar
  35. 35.
    Yue, Y., Finley, T., Radlinski, F., Joachims, T.: A support vector method for optimizing average precision. In: SIGIR 2007, pp. 271–278 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yuanfeng Song
    • 1
  • Kenneth Leung
    • 1
  • Qiong Fang
    • 1
  • Wilfred Ng
    • 1
  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyHong KongChina

Personalised recommendations