Advertisement

Knowledge and Information Systems

, Volume 43, Issue 3, pp 529–553 | Cite as

SFP-Rank: significant frequent pattern analysis for effective ranking

  • Yuanfeng Song
  • Wilfred Ng
  • Kenneth Wai-Ting Leung
  • Qiong FangEmail author
Regular Paper

Abstract

Ranking documents in terms of their relevance to a given query is fundamental to many real-life applications such as information retrieval and recommendation systems. Extensive study in these application domains has given rise to the development of many efficient ranking models. While most existing research focuses on developing learning to rank (LTR) models, the quality of the training features, which plays an important role in ranking performance, has not been fully studied. Thus, we propose a new approach that discovers effective features for the LTR problem. In this paper, we present a theoretical analysis on which frequent patterns are potentially effective for improving the performance of LTR and then propose an efficient method that selects frequent patterns for LTR. First, we define a new criterion, namely feature significance (or simply significance). Specifically, we use each feature’s value to rank the training instances and define the ranking effectiveness in terms of a performance measure as the significance of the feature. We show that the significance of an infrequent pattern is limited by using formal connection between pattern support and its significance. Then, we propose a methodology that sets the support value when performing frequent pattern mining. Finally, since frequent patterns are not equally effective for LTR, we further provide a coverage-based significant pattern generation algorithm to discover effective patterns and propose a new ranking approach called Significant Frequent Pattern-based Ranking (SFP-Rank), in which the ranking model is built upon the original features as well as the significant frequent patterns. Our experiments confirm that, by incorporating significant frequent patterns to train the ranking model, the performance of the ranking model can be substantially improved.

Keywords

Learning to rank Frequent patterns Feature selection Combined features Ranking performance 

Notes

Acknowledgments

We thank anonymous reviewers for their very useful comments and suggestions. This work is supported by HKUST GRF Grant 617610.

References

  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: VLDB ’94, pp 487–499Google Scholar
  2. 2.
  3. 3.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, Reading, MAGoogle Scholar
  4. 4.
    Batal I, Hauskrecht M (2010) Constructing classification features using minimal predictive patterns. In: Proceedings of the 19th ACM international conference on Information and knowledge management, CIKM ’10. ACM, New York, NY, USA, pp 869–878Google Scholar
  5. 5.
    Burges C, Ragno R, Le Q (2006) Learning to rank with nonsmooth cost functions. In: NIPS ’06, pp 193–200Google Scholar
  6. 6.
    Burges C, Shaked T, Renshaw E et al (2005) Learning to rank using gradient descent. In: ICML ’05, pp 89–96Google Scholar
  7. 7.
    Cao H, Jiang D, Pei J et al (2008) Context-aware query suggestion by mining click-through and session data. In: KDD ’08, pp 875–883Google Scholar
  8. 8.
    Cao Y, Xu J, Liu T-Y et al (2006) Adapting ranking svm to document retrieval. In: SIGIR ’06, pp 186–193Google Scholar
  9. 9.
    Cao Z, Qin T, Liu T-Y et al (2007) Learning to rank: from pairwise approach to listwise approach. In: ICML ’07, pp 129–136Google Scholar
  10. 10.
    Cheng H, Yan X, Han J et al (2007) Discriminative frequent pattern analysis for effective classification. In: ICDE ’07, pp 169–178Google Scholar
  11. 11.
    Cheng H, Yan X, Han J et al (2008) Direct discriminative pattern mining for effective classification. In: ICDE ’08, pp. 169–178Google Scholar
  12. 12.
    Cossock D, Zhang T (2006) Subset ranking using regression. In: Learning theory, volume 4005 of LNCS’06, pp 605–619Google Scholar
  13. 13.
    Fagin R, Kumar R, Sivakumar D (2003) Comparing top k lists. In: SODA ’03, pp 28–36Google Scholar
  14. 14.
    Fayyad UM, Irani KB (1993) Multi-interval discretization of continuous-valued attributes for classification learning. In: UAI ’93, pp 1022–1027Google Scholar
  15. 15.
    Freund Y, Iyer R, Schapire RE et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969Google Scholar
  16. 16.
    Geng X, Liu T-Y, Qin T et al (2007) Feature selection for ranking. In: SIGIR ’07, pp 407–414Google Scholar
  17. 17.
    Grahne G, Zhu J (2003) Efficiently using prefix-trees in mining frequent itemsets. In: FIMI’03Google Scholar
  18. 18.
    Han J, Cheng H, Xin D et al (2007) Frequent pattern mining: current status and future directions. Data Min. Knowl. Discov. 15(1):55–86CrossRefMathSciNetGoogle Scholar
  19. 19.
    Hong L, Bekkerman R, Adler J et al (2012) Learning to rank social update streams. In: SIGIR ’12, pp 651–660Google Scholar
  20. 20.
    Jansen BJ, Spink A, Bateman J et al (1998) Real life information retrieval: a study of user queries on the web. SIGIR Forum 32(1):5–17CrossRefGoogle Scholar
  21. 21.
    Jiang D, Leung KW-T, Ng W (2011) Context-aware search personalization with concept preference. In: CIKM ’11, pp 563–572Google Scholar
  22. 22.
    Joachims T (2006) Training linear svms in linear time. In: KDD ’06, pp 217–226Google Scholar
  23. 23.
    Karimzadehgan M, Li W, Zhang R et al (2011) A stochastic learning-to-rank algorithm and its application to contextual advertising. In: WWW ’11, pp 377–386Google Scholar
  24. 24.
    Li P, Burges CJC, Wu Q (2007) Mcrank: learning to rank using multiple classification and gradient boosting. In: NIPS ’07, pp 845–852Google Scholar
  25. 25.
    Li W, Han J, Pei J (2001) Cmar: Accurate and efficient classification based on multiple class-association rules. In: ICDM ’01, vol 0, pp 369–376Google Scholar
  26. 26.
    Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: KDD ’98, pp 80–86Google Scholar
  27. 27.
    Nallapati R (2004) Discriminative models for information retrieval. In: SIGIR ’04, pp 64–71Google Scholar
  28. 28.
    Qin T, Liu T, Tsai M et al (2006) Learning to search web pages with query-level loss functions. Technical report, Microsoft ResearchGoogle Scholar
  29. 29.
    Qin T, Liu T, Xu J et al (2010) Letor: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374CrossRefGoogle Scholar
  30. 30.
    Qin T, Zhang X-D, Wang D-S et al (2007) Ranking with multiple hyperplanes. In: SIGIR ’07, pp 279–286Google Scholar
  31. 31.
    Sculley D (2010) Combined regression and ranking. In: KDD ’10. ACM, New York, NY, USA, pp 979–988Google Scholar
  32. 32.
    Song Y, Leung K, Fang Q et al (2013) Fp-rank: an effective ranking approach based on frequent pattern analysis. In: DASFAA ’13Google Scholar
  33. 33.
    Tan J, Bu Y, Yang B (2009) An efficient close frequent pattern mining algorithm. In: ICICTA ’09, vol 1, pp 528–531Google Scholar
  34. 34.
    Thomas Fasciano RS, Shin MC (2012) Learning to rank biological motion trajectories. Image Vis Comput 31(6–7):502–510Google Scholar
  35. 35.
    Tong Y, Chen L, Cheng Y et al (2012) Mining frequent itemsets over uncertain databases. PVLDB’12 5(11):1650–1661Google Scholar
  36. 36.
    Tong Y, Chen L, Ding B (2012) Discovering threshold-based frequent closed itemsets over probabilistic data. In: ICDE ’12, pp 270–281Google Scholar
  37. 37.
    Tong Y, Chen L, Yu PS (2012) Ufimt: an uncertain frequent itemset mining toolbox. In: KDD ’12, pp 1508–1511Google Scholar
  38. 38.
    Tsai M-F, Liu T-Y, Qin T et al (2007) Frank: a ranking method with fidelity loss. In: SIGIR ’07, pp 383–390Google Scholar
  39. 39.
    Valizadegan H, Jin R, Zhang R et al (2009) Learning to rank by optimizing ndcg measure. In: NIPS ’09Google Scholar
  40. 40.
    Veloso AA, Almeida HM, Gonçalves MA et al (2008) Learning to rank at query-time using association rules. In: SIGIR ’08, pp 267–274Google Scholar
  41. 41.
    Verberne S, van Halteren H, Theijssen D et al (2011) Learning to rank for why-question answering. Inf Retr 14:107–132CrossRefGoogle Scholar
  42. 42.
    Volkovs MN, Zemel RS (2009) Boltzrank: learning to maximize expected ranking gain. In: ICML ’09, pp 1089–1096Google Scholar
  43. 43.
    Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng. 18:1497–1511CrossRefGoogle Scholar
  44. 44.
    Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: SIGIR ’07, pp 391–398Google Scholar
  45. 45.
    Yin X, Han J (2003) Cpar: classification based on predictive association rules. In: SDM’03Google Scholar
  46. 46.
    Yue Y, Finley T, Radlinski F et al (2007) A support vector method for optimizing average precision. In: SIGIR’07, pp 271–278Google Scholar

Copyright information

© Springer-Verlag London 2014

Authors and Affiliations

  • Yuanfeng Song
    • 1
  • Wilfred Ng
    • 1
  • Kenneth Wai-Ting Leung
    • 1
  • Qiong Fang
    • 1
    Email author
  1. 1.Department of Computer Science and EngineeringThe Hong Kong University of Science and TechnologyKowloonHong Kong, China

Personalised recommendations