Query-dependent cross-domain ranking in heterogeneous network

Abstract

Traditional learning-to-rank problem mainly focuses on one single type of objects. However, with the rapid growth of the Web 2.0, ranking over multiple interrelated and heterogeneous objects becomes a common situation, e.g., the heterogeneous academic network. In this scenario, one may have much training data for some type of objects (e.g. conferences) while only very few for the interested types of objects (e.g. authors). Thus, the two important questions are: (1) Given a networked data set, how could one borrow supervision from other types of objects in order to build an accurate ranking model for the interested objects with insufficient supervision? (2) If there are links between different objects, how can we exploit their relationships for improved ranking performance? In this work, we first propose a regularized framework called HCDRank to simultaneously minimize two loss functions related to these two domains. Then, we extend the approach by exploiting the link information between heterogeneous objects. We conduct a theoretical analysis to the proposed approach and derive its generalization bound to demonstrate how the two related domains could help each other in learning ranking functions. Experimental results on three different genres of data sets demonstrate the effectiveness of the proposed approaches.

This is a preview of subscription content, log in to check access.

References

  1. 1

    Agarwal A, Chakrabarti S, Aggarwal S (2006) Learning to rank networked entities. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’06), pp 14–23

  2. 2

    Amini M-R, Truong T-V, Goutte C (2008) A boosting algorithm for learning bipartite ranking functions with partially labeled data. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), pp 99–106

  3. 3

    Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Proceedings of the 18th neural information processing systems (NIPS’06), pp 41–48

  4. 4

    Baccini A, Dejean S, Lafage L, Mothe J (2011) How many performance measures to evaluate information retrieval systems? Knowl Inf Syst 1–21. doi:10.1007/s10115-011-0391-7

  5. 5

    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New York

    Google Scholar 

  6. 6

    Bar-Yossef Z, Guy I, Lempel R, Maarek YS, Soroka V (2008) Cluster ranking with an application to mining mailbox networks. Knowl Inf Syst 14(1): 101–139

    Article  Google Scholar 

  7. 7

    Bickel S, Brückner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 81–88

  8. 8

    Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J (2007) Learning bounds for domain adaptation. In: Proceedings of the 19th neural information processing systems (NIPS’07), pp 129–136

  9. 9

    Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of conference on empirical methods in natural language processing (EMNLP’06), pp 120–128

  10. 10

    Bonilla E, Chai KM, ChrisWilliams (2008) Multi-task gaussian process prediction. In: Proceedings of the 20th neural information processing systems (NIPS’08), pp 153–160

  11. 11

    Brefeld U, Scheffer T (2005) Auc maximizing support vector learning. In: Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML 2005)

  12. 12

    Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04), pp 25–32

  13. 13

    Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22th international conference on machine learning (ICML’05), pp 89–96

  14. 14

    Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 1189–1198

  15. 15

    Chen K, Lu R, Wong CK, Sun G, Heck L, Tseng B (2008) Trada: tree based ranking function adaptation. In: Proceedings of the 17th ACM international conference on information and knowledge management (CIKM’08), pp 1143–1152

  16. 16

    Cui J, Liu H, He J, Li P, Du X, Wang P (2011) Tagclus: a random walk-based method for tag clustering. Knowl and Inf Syst 27(2): 193–225

    MATH  Article  Google Scholar 

  17. 17

    Czarnowski I (2011) Cluster-based instance selection for machine classification. Knowl Inf Syst

  18. 18

    Dai W, Jin O, Xue G, Yang Q, Yu Y (2009) Eigentransfer: a unified framework for transfer learning. In: Proceedings of the 26th annual international conference on machine learning (ICML’09), pp 193–200

  19. 19

    Dai W, Yang Q, Xue G-R, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 193–200

  20. 20

    Duh K, Kirchhoff K (2008) Learning to rank with partially-labeled data. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), pp 251–258

  21. 21

    Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 109–117

  22. 22

    Gao J, Fan W, Jian J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 283–291

  23. 23

    Gao J, Fan W, Sun Y, Han J (2009) Heterogeneous source consensus learning via decision propagation and negotiation. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining(KDD’09), pp 339–348

  24. 24

    Gao J, Wu Q, Burges C, Svore K, Su Y, Khan N, Shah S, Zhou H (2009) Model adaptation via model interpolation and boosting for web search ranking. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP’09), pp 505–513

  25. 25

    Geng B, Yang L, Xu C, Hua X (2009) Ranking model adaptation for domain-specific search. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM’09), pp 197–206

  26. 26

    Gupta SK, Phung D, Adams B, Tran T, Venkatesh S (2010) Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 1169–1178

  27. 27

    He J, Liu Y, Lawrence R (2009) Graph-based transfer learning. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM’09), pp 937–946

  28. 28

    Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. MIT Press, Cambridge

    Google Scholar 

  29. 29

    Hoi SC, Jin R (2008) Semi-supervised ensemble ranking. In: Proceedings of association for the advancement of artificial intelligence (AAAI’08), pp 634–639

  30. 30

    Jarvelin K, Kekalainen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00), pp 41–48

  31. 31

    Jebara T (2004) Multi-task feature and kernel selection for svms. In: Proceedings of the 21th international conference on machine learning (ICML’04), pp 55–62

  32. 32

    Jiang L, Li C, Cai Z (2009) Learning decision tree for ranking. Knowl Inf Syst 20(1): 123–135

    Article  Google Scholar 

  33. 33

    Joachims T (2002) Learning to classify text using support vector machines. Dissertation

  34. 34

    Joachims T (2006) Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), pp 217–226

  35. 35

    Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst 27(2): 303–325

    Article  Google Scholar 

  36. 36

    Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 489–496

  37. 37

    Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th annual international conference on machine learning(ICML’09), pp 617–624

  38. 38

    Ling X, Xue G, Dai W, Jiang Y, Yang Q, Yu Y (2008) Can chinese web pages be classified with english data source? In: Proceeding of the 17th international conference on World Wide Web (WWW’08), pp 969–978

  39. 39

    Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2,1-norm minimization. In: The twenty-fifth conference on uncertainty in artificial intelligence (UAI’09), pp 339–348

  40. 40

    Liu T-Y, Xu J, Qin T, Xiong W, Li H (2007) Letor: Benchmark dataset for research on learning to rank for information retrieval. In: LR4IR 2007, in conjunction with SIGIR 2007

  41. 41

    Mihalkova L, Mooney RJ (2009) Transfer learning from minimal target data by mapping across relational domains. In: Proceedings of the 21st international jont conference on artifical intelligence(IJCAI’09), pp 1163–1168

  42. 42

    Pan SJ, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international World Wide Web conference(WWW’10), pp 751–760

  43. 43

    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng (TKDE) 22(10): 1345–1359

    Article  Google Scholar 

  44. 44

    Qin T, Liu T, Zhang X, Wang D, Xiong W, Li H (2008) Learning to rank relational objects and its application to web search. In: 17th international World Wide Web conference (WWW’08), pp 407–416

  45. 45

    Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: Transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 759–766

  46. 46

    Rosa KD, Metsis V, Athitsos V (2011) Boosted ranking models: a unifying framework for ranking prediction. Knowl Inf Syst 1–26. doi:10.1007/s10115-011-0390-8

  47. 47

    Shi X, Liu Q, Fan W, Yu PS, Zhu R (2010) Transfer learning on heterogenous feature spaces via spectral transformation. In: Proceedings of the 2010 IEEE international conference on data mining (ICDM’10), pp 1049–1054

  48. 48

    Szummer M, Jaakkola T (2002) Partially labeled classification with markov random walks. In: Advances in neural information processing systems (NIPS’02), pp 945–952

  49. 49

    Tang J, Jin R, Zhang J (2008) A topic modeling approach and its integration into the random walk framework for academic search. In: Proceedings of 2008 IEEE international conference on data mining (ICDM’08), pp 1055–1060

  50. 50

    Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’08), pp 990–998

  51. 51

    Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Kluwer, Norwell, pp 91–109

    Google Scholar 

  52. 52

    Wang B, Tang J, Fan W, Chen S, Yang Z, Liu Y (2009) Heterogeneous cross domain ranking in latent space. In: Proceedings of the eighteenth conference on information and knowledge management (CIKM’09), pp 987–996

  53. 53

    Wang Z, Song Y, Zhang C (2008) Transferred dimensionality reduction. In: Machine learning and knowledge discovery in databases, European conference (ECML/PKDD’08), pp 550–565

  54. 54

    Wong T-L, Lam W, Chen B (2009) Mining employment market via text block detection and adaptive cross-domain information extraction. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval(SIGIR’09), pp 283–290

  55. 55

    Xie S, Fan W, Peng J, Verscheure O, Ren J (2009) Latent space domain transfer between high dimensional overlapping distributions. In: Proceedings of the 18th international conference on World wide web(WWW’09), pp 91–100

  56. 56

    Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 391–398

  57. 57

    Yang Q, Chen Y, Xue G, Dai W, Yu Y (2009) Heterogeneous transfer learning for image clustering via the social web. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1 (ACL’09), pp 1–9

  58. 58

    Yang Z, Tang J, Wang B, Guo J, Li J, Chen S (2009) Expert2bole: from expert finding to bole search. In: Proceeding of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09)

  59. 59

    Yue Y, Finley T, Radlinski F, Joachims T (2007) A support vector method for optimizing average precision. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 271–278

  60. 60

    Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th conference on information and knowledge management (CIKM’01), pp 403–410

  61. 61

    Zheng Z, Chen K, Sun G, Zha H (2007) A regression framework for learning ranking functions using relative relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 287–294

  62. 62

    Zhong E, Fan W, Peng J, Zhang K, Ren J, Turaga D, Verscheure O (2009) Cross domain distribution adaptation via kernel mapping. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining(KDD’09), pp 1027–1036

  63. 63

    Zhu J, Huang X, Song D, Ruger SM (2010) Integrating multiple document features in language models for expert finding. Knowl Inf Syst 23(1): 29–54

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jie Tang.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wang, B., Tang, J., Fan, W. et al. Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 34, 109–145 (2013). https://doi.org/10.1007/s10115-011-0472-7

Download citation

Keywords

  • Cross-domain ranking
  • Heterogeneous network
  • Latent space
  • Learning to rank