Advertisement

Data Mining and Knowledge Discovery

, Volume 32, Issue 5, pp 1339–1367 | Cite as

ColdRoute: effective routing of cold questions in stack exchange sites

  • Jiankai Sun
  • Abhinav Vishnu
  • Aniket Chakrabarti
  • Charles Siegel
  • Srinivasan Parthasarathy
Article
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2018

Abstract

Routing questions in Community Question Answer services such as Stack Exchange sites is a well-studied problem. Yet, cold-start—a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision@1, Accuracy, MRR) over the state-of-the-art models such as semantic matching by 159.5, 31.84, and 40.36% for cold questions posted by existing askers, and 123.1, 27.03, and 34.81% for cold questions posted by new askers respectively.

Keywords

Question routing Expert finding Cold-start problem Question answering services 

Notes

Acknowledgements

This work is supported by NSF grants CCF-1645599, IIS-1550302, and CNS-1513120, and a grant from the Ohio Supercomputer Center (PAS0166).

References

  1. Anderson A et al (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 850–858Google Scholar
  2. Aslay C, O’Hare N, Aiello LM, Jaimes A (2013) Competition-based networks for expert finding. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13, pp 1033–1036Google Scholar
  3. Bouguessa M, Dumoulin B, Wang S (2008) Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, pp 866–874Google Scholar
  4. Cheng P, Wang S, Ma J, Sun J, Xiong H (2017) Learning to recommend accurate and diverse items. In: Proceedings of the 26th international conference on World Wide Web, WWW ’17, pp 183–192Google Scholar
  5. Dong H, Wang J, Lin H, Xu B, Yang Z (2015) Predicting best answerers for new questions: an approach leveraging distributed representations of words in community question answering. In: 2015 Ninth international conference on frontier of computer science and technology, pp 13–18Google Scholar
  6. Dror G, Koren Y, Maarek Y, Szpektor I (2011) I want to answer; who has a question? Yahoo! answers recommender system. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, pp 1109–1117Google Scholar
  7. Fang H, Wu F, Zhao Z, Duan X, Zhuang Y, Ester M (2016) Community-based question answering via heterogeneous social network learning. In: AAAI conference on artificial intelligence, AAAI’16, pp 122–128Google Scholar
  8. Figueroa A, Neumann G (2013) Learning to rank effective paraphrases from query logs for community question answering. In: AAAI conference on artificial intelligence, AAAI’13, pp 1099–1105Google Scholar
  9. Grover A et al (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, pp 855–864Google Scholar
  10. Guo J, Xu S, Bao S, Yu Y (2008) Tapping on the potential of QA community by recommending answer providers. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08, pp 921–930Google Scholar
  11. Hanrahan BV, Convertino G, Nelson L (2012) Modeling problem difficulty and expertise in stackoverflow. In: Proceedings of the ACM 2012 conference on computer supported cooperative work companion, CSCW ’12, pp 91–94Google Scholar
  12. Herbrich R, Minka T, Graepel T (2007) Trueskill\(^{{\rm TM}}\): a bayesian skill rating system. In: Advances in neural information processing systems (NIPS), pp 569–576Google Scholar
  13. Horowitz D, Kamvar SD (2010) The anatomy of a large-scale social search engine. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10, pp 431–440Google Scholar
  14. Huna A, Srba I, Bielikova M (2016) Exploiting content quality and question difficulty in CQA reputation systems. In: Proceedings of the 12th international conference and school on advances in network science—vol 9564, NetSci-X 2016, pp 68–81Google Scholar
  15. Ji Z, Xu F, Wang B, He B (2012) Question-answer topic model for question retrieval in community question answering. In: Proceedings of the 21st ACM international on conference on information and knowledge management, CIKM ’12, pp 2471–2474Google Scholar
  16. Jiang M, Cui P, Chen X, Wang F, Zhu W, Yang S (2015) Social recommendation with cross-domain transferable knowledge. IEEE transactions on knowledge and data engineering, pp 3084–3097Google Scholar
  17. Kumar V, Pedanekar N (2016) Mining shapes of expertise in online social Q&A communities. In: Proceedings of the 19th ACM conference on computer supported cooperative work and social computing companion, CSCW’16, pp 317–320Google Scholar
  18. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning, pp 1188–1196Google Scholar
  19. Li B, King I (2010) Routing questions to appropriate answerers in community question answering services. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp 1585–1588Google Scholar
  20. Li B, King I, Lyu MR (2011) Question routing in community question answering: putting category in its place. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, pp 2041–2044Google Scholar
  21. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: AAAI conference on artificial intelligence, AAAI’15, pp 2181–2187Google Scholar
  22. Liu J, Song Y-I, Lin C-Y (2011) Competition-based user expertise score estimation. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’11, pp 425–434Google Scholar
  23. Liu Z, Li K, Qu D (2017) Knowledge graph based question routing for community question answering. In: Neural information processing. Springer, Berlin, pp 721–730Google Scholar
  24. McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson AM, Boulos T, Kubica J (2013) Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13, pp 1222–1230Google Scholar
  25. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems—vol 2, NIPS’13, pp 3111–3119Google Scholar
  26. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, EMNLP ’09, pp 248–256Google Scholar
  27. Rendle S (2010) Factorization machines. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10, pp 995–1000Google Scholar
  28. Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol 57:1–57CrossRefGoogle Scholar
  29. Shen Y, Rong W, Sun Z, Ouyang Y, Xiong Z (2015) Question/answer matching for CQA system via combining lexical and sequential information. In: AAAI conference on artificial intelligence, AAAI’15, pp 275–281Google Scholar
  30. Song H, Ren Z, Liang S, Li P, Ma J, de Rijke M (2017) Summarizing answers in non-factoid community question-answering. In: Proceedings of the tenth ACM international conference on Web Search and Data Mining, WSDM ’17, pp 405–414Google Scholar
  31. Srba I, Bielikova M (2016) A comprehensive survey and classification of approaches for community question answering. ACM Trans Web 18:1–63CrossRefGoogle Scholar
  32. Sun J, Ajwani D, Nicholson PK, Sala A, Parthasarathy S (2017) Breaking cycles in noisy hierarchies. In: Proceedings of the 2017 ACM on Web Science conference, WebSci ’17, pp 151–160Google Scholar
  33. Sun J, Moosavi S, Ramnath R, Parthasarathy S (2018) QDEE: question difficulty and expertise estimation in community question answering sites. In: The 12th international AAAI conference On Web and Social Media, ICWSM’18Google Scholar
  34. Sun J, Wang S, Gao BJ, Ma J (2012) Learning to rank for hybrid recommendation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 2239–2242Google Scholar
  35. Sung J, Lee J-G, Lee U (2013) Booming up the long tails: Discovering potentially contributive users in community-based question answering services. In: International AAAI conference on Web and Social Media, ICWSM’13Google Scholar
  36. Szpektor I, Maarek Y, Pelleg D (2013) When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In: Proceedings of the 22nd international conference on World Wide Web, WWW ’13, pp 1249–1260Google Scholar
  37. Tatti N (2014) Faster way to agony discovering hierarchies in directed graphs. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15–19, 2014. Proceedings, Part III, pp 163–178Google Scholar
  38. Tatti N (2015) Hierarchies in directed networks. In: 2015 IEEE international conference on data mining, pp 991–996Google Scholar
  39. Wang Q, Jing Liu BW, Guo L (2014a) A regularized competition model for question difficulty estimation in community question answering services. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP’14, pp 1115–1126Google Scholar
  40. Wang S, Sun J, Gao BJ, Ma J (2012) Adapting vector space model to ranking-based collaborative filtering. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 1487–1491Google Scholar
  41. Wang S, Sun J, Gao BJ, Ma J (2014b) Vsrank: a novel framework for ranking-based collaborative filtering. ACM Trans Intell Syst Technol 24:1–51Google Scholar
  42. Xu F, Ji Z, Wang B (2012) Dual role model for question recommendation in community question answering. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’12, pp 771–780Google Scholar
  43. Yang B, Manandhar S (2014) Exploring user expertise and descriptive ability in community question answering. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM’14, pp 320–327Google Scholar
  44. Yang J, Adamic L, Ackerman M (2008) Competing to share expertise: the Taskcn knowledge sharing community. In: Proceedings of the 2nd international conference on Weblogs and Social Media, pp 161–168Google Scholar
  45. Yang J, Tao K, Bozzon A, Houben G-J (2014) Sparrows and owls: characterisation of expert behaviour in stackoverflow. In: International conference on user modeling, adaptation, and personalization, pp 266–277Google Scholar
  46. Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) CQARank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM international on conference on information and knowledge management, CIKM ’13, pp 99–108Google Scholar
  47. Yu H-F, Hsieh C-J, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM’12, pp 765–774Google Scholar
  48. Zhang J, Ackerman MS, Adamic L (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web, WWW ’07, pp 221–230Google Scholar
  49. Zhao T, Bian N, Li C, Li M (2013) Topic-level expert modeling in community question answering. In: Proceedings of the 2013 SIAM international conference on data mining, SDM’13, pp 776–784Google Scholar
  50. Zhao Z, Cheng J, Wei F, Zhou M, Ng W, Wu Y (2014) Socialtransfer: transferring social knowledge for cold-start cowdsourcing. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM ’14, pp 779–788Google Scholar
  51. Zhao Z, Lu H, Zheng VW, Cai D, He X, Zhuang Y (2017) Community-based question answering via asymmetric multi-faceted ranking network learning. In: AAAI conference on artificial intelligence, AAAI’17, pp 3532–3539Google Scholar
  52. Zhao Z, Wei F, Zhou M, Ng W (2015a) Cold-start expert finding in community question answering via graph regularization. In: Database systems for advanced applications, pp 21–38Google Scholar
  53. Zhao Z, Yang Q, Cai D, He X, Zhuang Y (2016) Expert finding for community-based question answering via ranking metric network learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16, pp 3000–3006Google Scholar
  54. Zhao Z, Zhang L, He X, Ng W (2015b) Expert finding for question answering via graph regularized matrix completion. In: IEEE transactions on knowledge and data engineering, TKDE’15, pp 993–1004Google Scholar
  55. Zhou G, Lai S, Liu K, Zhao J (2012a) Topic-sensitive probabilistic model for expert finding in question answer communities. In: Proceedings of the 21st ACM international on conference on information and knowledge management, CIKM ’12, pp 1662–1666Google Scholar
  56. Zhou G, Liu Y, Liu F, Zeng D, Zhao J (2013) Improving question retrieval in community question answering using world knowledge. In: Proceedings of the twenty-third international joint conference on artificial intelligence, ICJAI’13, pp 2239–2245Google Scholar
  57. Zhou TC, Si X, Chang EY, King I, Lyu MR (2012b) A data-driven approach to question subjectivity identification in community question answering. In: AAAI conference on artificial intelligence, AAAI’12, pp 164–170Google Scholar
  58. Zhu H, Cao H, Xiong H, Chen E, Tian J (2011) Towards expert finding by leveraging relevant categories in authority ranking. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, pp 2221–2224Google Scholar
  59. Zhu H, Chen E, Xiong H, Cao H, Tian J (2014) Ranking user authority with relevant knowledge categories for expert finding. In: Proceedings of the 23rd international conference on World Wide Web, WWW’14, pp 1081–1107Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.The Ohio State UniversityColumbusUSA
  2. 2.Pacific Northwest National LaboratoryRichlandUSA
  3. 3.MicrosoftAlbuquerqueUSA

Personalised recommendations