Knowledge and Information Systems

, Volume 34, Issue 1, pp 109–145 | Cite as

Query-dependent cross-domain ranking in heterogeneous network

  • Bo Wang
  • Jie Tang
  • Wei Fan
  • Songcan Chen
  • Chenhao Tan
  • Zi Yang
Regular Paper

Abstract

Traditional learning-to-rank problem mainly focuses on one single type of objects. However, with the rapid growth of the Web 2.0, ranking over multiple interrelated and heterogeneous objects becomes a common situation, e.g., the heterogeneous academic network. In this scenario, one may have much training data for some type of objects (e.g. conferences) while only very few for the interested types of objects (e.g. authors). Thus, the two important questions are: (1) Given a networked data set, how could one borrow supervision from other types of objects in order to build an accurate ranking model for the interested objects with insufficient supervision? (2) If there are links between different objects, how can we exploit their relationships for improved ranking performance? In this work, we first propose a regularized framework called HCDRank to simultaneously minimize two loss functions related to these two domains. Then, we extend the approach by exploiting the link information between heterogeneous objects. We conduct a theoretical analysis to the proposed approach and derive its generalization bound to demonstrate how the two related domains could help each other in learning ranking functions. Experimental results on three different genres of data sets demonstrate the effectiveness of the proposed approaches.

Keywords

Cross-domain ranking Heterogeneous network Latent space Learning to rank 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal A, Chakrabarti S, Aggarwal S (2006) Learning to rank networked entities. In: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’06), pp 14–23Google Scholar
  2. 2.
    Amini M-R, Truong T-V, Goutte C (2008) A boosting algorithm for learning bipartite ranking functions with partially labeled data. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), pp 99–106Google Scholar
  3. 3.
    Argyriou A, Evgeniou T, Pontil M (2006) Multi-task feature learning. In: Proceedings of the 18th neural information processing systems (NIPS’06), pp 41–48Google Scholar
  4. 4.
    Baccini A, Dejean S, Lafage L, Mothe J (2011) How many performance measures to evaluate information retrieval systems? Knowl Inf Syst 1–21. doi:10.1007/s10115-011-0391-7
  5. 5.
    Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM Press, New YorkGoogle Scholar
  6. 6.
    Bar-Yossef Z, Guy I, Lempel R, Maarek YS, Soroka V (2008) Cluster ranking with an application to mining mailbox networks. Knowl Inf Syst 14(1): 101–139CrossRefGoogle Scholar
  7. 7.
    Bickel S, Brückner M, Scheffer T (2007) Discriminative learning for differing training and test distributions. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 81–88Google Scholar
  8. 8.
    Blitzer J, Crammer K, Kulesza A, Pereira F, Wortman J (2007) Learning bounds for domain adaptation. In: Proceedings of the 19th neural information processing systems (NIPS’07), pp 129–136Google Scholar
  9. 9.
    Blitzer J, McDonald R, Pereira F (2006) Domain adaptation with structural correspondence learning. In: Proceedings of conference on empirical methods in natural language processing (EMNLP’06), pp 120–128Google Scholar
  10. 10.
    Bonilla E, Chai KM, ChrisWilliams (2008) Multi-task gaussian process prediction. In: Proceedings of the 20th neural information processing systems (NIPS’08), pp 153–160Google Scholar
  11. 11.
    Brefeld U, Scheffer T (2005) Auc maximizing support vector learning. In: Proceedings of the 2nd workshop on ROC analysis in machine learning (ROCML 2005)Google Scholar
  12. 12.
    Buckley C, Voorhees EM (2004) Retrieval evaluation with incomplete information. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’04), pp 25–32Google Scholar
  13. 13.
    Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22th international conference on machine learning (ICML’05), pp 89–96Google Scholar
  14. 14.
    Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 1189–1198Google Scholar
  15. 15.
    Chen K, Lu R, Wong CK, Sun G, Heck L, Tseng B (2008) Trada: tree based ranking function adaptation. In: Proceedings of the 17th ACM international conference on information and knowledge management (CIKM’08), pp 1143–1152Google Scholar
  16. 16.
    Cui J, Liu H, He J, Li P, Du X, Wang P (2011) Tagclus: a random walk-based method for tag clustering. Knowl and Inf Syst 27(2): 193–225MATHCrossRefGoogle Scholar
  17. 17.
    Czarnowski I (2011) Cluster-based instance selection for machine classification. Knowl Inf SystGoogle Scholar
  18. 18.
    Dai W, Jin O, Xue G, Yang Q, Yu Y (2009) Eigentransfer: a unified framework for transfer learning. In: Proceedings of the 26th annual international conference on machine learning (ICML’09), pp 193–200Google Scholar
  19. 19.
    Dai W, Yang Q, Xue G-R, Yu Y (2007) Boosting for transfer learning. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 193–200Google Scholar
  20. 20.
    Duh K, Kirchhoff K (2008) Learning to rank with partially-labeled data. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’08), pp 251–258Google Scholar
  21. 21.
    Evgeniou T, Pontil M (2004) Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (KDD’04), pp 109–117Google Scholar
  22. 22.
    Gao J, Fan W, Jian J, Han J (2008) Knowledge transfer via multiple model local structure mapping. In: Proceeding of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’08), pp 283–291Google Scholar
  23. 23.
    Gao J, Fan W, Sun Y, Han J (2009) Heterogeneous source consensus learning via decision propagation and negotiation. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining(KDD’09), pp 339–348Google Scholar
  24. 24.
    Gao J, Wu Q, Burges C, Svore K, Su Y, Khan N, Shah S, Zhou H (2009) Model adaptation via model interpolation and boosting for web search ranking. In: Proceedings of the 2009 conference on empirical methods in natural language processing (EMNLP’09), pp 505–513Google Scholar
  25. 25.
    Geng B, Yang L, Xu C, Hua X (2009) Ranking model adaptation for domain-specific search. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM’09), pp 197–206Google Scholar
  26. 26.
    Gupta SK, Phung D, Adams B, Tran T, Venkatesh S (2010) Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’10), pp 1169–1178Google Scholar
  27. 27.
    He J, Liu Y, Lawrence R (2009) Graph-based transfer learning. In: Proceeding of the 18th ACM conference on information and knowledge management (CIKM’09), pp 937–946Google Scholar
  28. 28.
    Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. MIT Press, CambridgeGoogle Scholar
  29. 29.
    Hoi SC, Jin R (2008) Semi-supervised ensemble ranking. In: Proceedings of association for the advancement of artificial intelligence (AAAI’08), pp 634–639Google Scholar
  30. 30.
    Jarvelin K, Kekalainen J (2000) Ir evaluation methods for retrieving highly relevant documents. In: Proceedings of the 23th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00), pp 41–48Google Scholar
  31. 31.
    Jebara T (2004) Multi-task feature and kernel selection for svms. In: Proceedings of the 21th international conference on machine learning (ICML’04), pp 55–62Google Scholar
  32. 32.
    Jiang L, Li C, Cai Z (2009) Learning decision tree for ranking. Knowl Inf Syst 20(1): 123–135CrossRefGoogle Scholar
  33. 33.
    Joachims T (2002) Learning to classify text using support vector machines. DissertationGoogle Scholar
  34. 34.
    Joachims T (2006) Training linear svms in linear time. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’06), pp 217–226Google Scholar
  35. 35.
    Kang U, Tsourakakis CE, Faloutsos C (2011) Pegasus: mining peta-scale graphs. Knowl Inf Syst 27(2): 303–325CrossRefGoogle Scholar
  36. 36.
    Lee S-I, Chatalbashev V, Vickrey D, Koller D (2007) Learning a meta-level prior for feature relevance from multiple related tasks. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 489–496Google Scholar
  37. 37.
    Li B, Yang Q, Xue X (2009) Transfer learning for collaborative filtering via a rating-matrix generative model. In: Proceedings of the 26th annual international conference on machine learning(ICML’09), pp 617–624Google Scholar
  38. 38.
    Ling X, Xue G, Dai W, Jiang Y, Yang Q, Yu Y (2008) Can chinese web pages be classified with english data source? In: Proceeding of the 17th international conference on World Wide Web (WWW’08), pp 969–978Google Scholar
  39. 39.
    Liu J, Ji S, Ye J (2009) Multi-task feature learning via efficient l 2,1-norm minimization. In: The twenty-fifth conference on uncertainty in artificial intelligence (UAI’09), pp 339–348Google Scholar
  40. 40.
    Liu T-Y, Xu J, Qin T, Xiong W, Li H (2007) Letor: Benchmark dataset for research on learning to rank for information retrieval. In: LR4IR 2007, in conjunction with SIGIR 2007Google Scholar
  41. 41.
    Mihalkova L, Mooney RJ (2009) Transfer learning from minimal target data by mapping across relational domains. In: Proceedings of the 21st international jont conference on artifical intelligence(IJCAI’09), pp 1163–1168Google Scholar
  42. 42.
    Pan SJ, Ni X, Sun J, Yang Q, Chen Z (2010) Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of the 19th international World Wide Web conference(WWW’10), pp 751–760Google Scholar
  43. 43.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng (TKDE) 22(10): 1345–1359CrossRefGoogle Scholar
  44. 44.
    Qin T, Liu T, Zhang X, Wang D, Xiong W, Li H (2008) Learning to rank relational objects and its application to web search. In: 17th international World Wide Web conference (WWW’08), pp 407–416Google Scholar
  45. 45.
    Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: Transfer learning from unlabeled data. In: Proceedings of the 24th international conference on machine learning (ICML’07), pp 759–766Google Scholar
  46. 46.
    Rosa KD, Metsis V, Athitsos V (2011) Boosted ranking models: a unifying framework for ranking prediction. Knowl Inf Syst 1–26. doi:10.1007/s10115-011-0390-8
  47. 47.
    Shi X, Liu Q, Fan W, Yu PS, Zhu R (2010) Transfer learning on heterogenous feature spaces via spectral transformation. In: Proceedings of the 2010 IEEE international conference on data mining (ICDM’10), pp 1049–1054Google Scholar
  48. 48.
    Szummer M, Jaakkola T (2002) Partially labeled classification with markov random walks. In: Advances in neural information processing systems (NIPS’02), pp 945–952Google Scholar
  49. 49.
    Tang J, Jin R, Zhang J (2008) A topic modeling approach and its integration into the random walk framework for academic search. In: Proceedings of 2008 IEEE international conference on data mining (ICDM’08), pp 1055–1060Google Scholar
  50. 50.
    Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z (2008) Arnetminer: Extraction and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD’08), pp 990–998Google Scholar
  51. 51.
    Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. Kluwer, Norwell, pp 91–109Google Scholar
  52. 52.
    Wang B, Tang J, Fan W, Chen S, Yang Z, Liu Y (2009) Heterogeneous cross domain ranking in latent space. In: Proceedings of the eighteenth conference on information and knowledge management (CIKM’09), pp 987–996Google Scholar
  53. 53.
    Wang Z, Song Y, Zhang C (2008) Transferred dimensionality reduction. In: Machine learning and knowledge discovery in databases, European conference (ECML/PKDD’08), pp 550–565Google Scholar
  54. 54.
    Wong T-L, Lam W, Chen B (2009) Mining employment market via text block detection and adaptive cross-domain information extraction. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval(SIGIR’09), pp 283–290Google Scholar
  55. 55.
    Xie S, Fan W, Peng J, Verscheure O, Ren J (2009) Latent space domain transfer between high dimensional overlapping distributions. In: Proceedings of the 18th international conference on World wide web(WWW’09), pp 91–100Google Scholar
  56. 56.
    Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 391–398Google Scholar
  57. 57.
    Yang Q, Chen Y, Xue G, Dai W, Yu Y (2009) Heterogeneous transfer learning for image clustering via the social web. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: Volume 1 (ACL’09), pp 1–9Google Scholar
  58. 58.
    Yang Z, Tang J, Wang B, Guo J, Li J, Chen S (2009) Expert2bole: from expert finding to bole search. In: Proceeding of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (KDD’09)Google Scholar
  59. 59.
    Yue Y, Finley T, Radlinski F, Joachims T (2007) A support vector method for optimizing average precision. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 271–278Google Scholar
  60. 60.
    Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th conference on information and knowledge management (CIKM’01), pp 403–410Google Scholar
  61. 61.
    Zheng Z, Chen K, Sun G, Zha H (2007) A regression framework for learning ranking functions using relative relevance judgments. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), pp 287–294Google Scholar
  62. 62.
    Zhong E, Fan W, Peng J, Zhang K, Ren J, Turaga D, Verscheure O (2009) Cross domain distribution adaptation via kernel mapping. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining(KDD’09), pp 1027–1036Google Scholar
  63. 63.
    Zhu J, Huang X, Song D, Ruger SM (2010) Integrating multiple document features in language models for expert finding. Knowl Inf Syst 23(1): 29–54CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2012

Authors and Affiliations

  • Bo Wang
    • 1
  • Jie Tang
    • 2
  • Wei Fan
    • 3
  • Songcan Chen
    • 1
  • Chenhao Tan
    • 2
  • Zi Yang
    • 2
  1. 1.Department of Computer ScienceNanjing University of Aeronautics and AstronauticsNanjingChina
  2. 2.Department of Computer ScienceTsinghua UniversityBeijingChina
  3. 3.IBM T.J. Watson Research CenterNew YorkUSA

Personalised recommendations