ColdRoute: effective routing of cold questions in stack exchange sites

Abstract

Routing questions in Community Question Answer services such as Stack Exchange sites is a well-studied problem. Yet, cold-start—a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision@1, Accuracy, MRR) over the state-of-the-art models such as semantic matching by 159.5, 31.84, and 40.36% for cold questions posted by existing askers, and 123.1, 27.03, and 34.81% for cold questions posted by new askers respectively.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    https://physics.stackexchange.com/.

  2. 2.

    More technical details can be viewed at Sect. 5.3.

  3. 3.

    Other Stack Exchange sites demonstrate a similar trend. To reduce space usage, we report eight large and popular Stack Exchange sites in our paper.

  4. 4.

    As Doc2Vec is heavily related to Word2Vec, we only reported Doc2Vec in our experiments.

  5. 5.

    https://stackoverflow.com/help/why-vote.

  6. 6.

    Questions which have at least five answers are selected for evaluation.

  7. 7.

    https://stackoverflow.com/help/tagging.

  8. 8.

    This forum is accessible from https://www.java-forums.org/forum.php.

  9. 9.

    Detailed statistics can be seen in Table 3.

  10. 10.

    We used the data dump which is released on June 12, 2017 and is available online at https://archive.org/details/stackexchange.

  11. 11.

    http://scikit-learn.org/stable/modules/feature_extraction.html.

  12. 12.

    https://radimrehurek.com/gensim/models/doc2vec.html.

  13. 13.

    https://radimrehurek.com/gensim/models/ldamodel.html.

  14. 14.

    https://www.cs.utexas.edu/~rofuyu/libpmf/.

  15. 15.

    https://keras.io/.

  16. 16.

    http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html.

  17. 17.

    http://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVR.html.

  18. 18.

    ne is short for newly posted questions asked by existing askers.

  19. 19.

    \(\frac{|\mathcal {T}|}{|\mathcal {T}| + |\mathcal {Q}| +|\mathcal {U}|}\), where \(|\mathcal {T}| + |\mathcal {Q}| +|\mathcal {U}|\) is the length of the feature vector used by ColdRoute-T.

  20. 20.

    nn is short for newly posted questions asked by new askers.

References

  1. Anderson A et al (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 850–858

  2. Aslay C, O’Hare N, Aiello LM, Jaimes A (2013) Competition-based networks for expert finding. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13, pp 1033–1036

  3. Bouguessa M, Dumoulin B, Wang S (2008) Identifying authoritative actors in question-answering forums: the case of Yahoo! answers. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’08, pp 866–874

  4. Cheng P, Wang S, Ma J, Sun J, Xiong H (2017) Learning to recommend accurate and diverse items. In: Proceedings of the 26th international conference on World Wide Web, WWW ’17, pp 183–192

  5. Dong H, Wang J, Lin H, Xu B, Yang Z (2015) Predicting best answerers for new questions: an approach leveraging distributed representations of words in community question answering. In: 2015 Ninth international conference on frontier of computer science and technology, pp 13–18

  6. Dror G, Koren Y, Maarek Y, Szpektor I (2011) I want to answer; who has a question? Yahoo! answers recommender system. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’11, pp 1109–1117

  7. Fang H, Wu F, Zhao Z, Duan X, Zhuang Y, Ester M (2016) Community-based question answering via heterogeneous social network learning. In: AAAI conference on artificial intelligence, AAAI’16, pp 122–128

  8. Figueroa A, Neumann G (2013) Learning to rank effective paraphrases from query logs for community question answering. In: AAAI conference on artificial intelligence, AAAI’13, pp 1099–1105

  9. Grover A et al (2016) Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, pp 855–864

  10. Guo J, Xu S, Bao S, Yu Y (2008) Tapping on the potential of QA community by recommending answer providers. In: Proceedings of the 17th ACM conference on information and knowledge management, CIKM ’08, pp 921–930

  11. Hanrahan BV, Convertino G, Nelson L (2012) Modeling problem difficulty and expertise in stackoverflow. In: Proceedings of the ACM 2012 conference on computer supported cooperative work companion, CSCW ’12, pp 91–94

  12. Herbrich R, Minka T, Graepel T (2007) Trueskill\(^{{\rm TM}}\): a bayesian skill rating system. In: Advances in neural information processing systems (NIPS), pp 569–576

  13. Horowitz D, Kamvar SD (2010) The anatomy of a large-scale social search engine. In: Proceedings of the 19th international conference on World Wide Web, WWW ’10, pp 431–440

  14. Huna A, Srba I, Bielikova M (2016) Exploiting content quality and question difficulty in CQA reputation systems. In: Proceedings of the 12th international conference and school on advances in network science—vol 9564, NetSci-X 2016, pp 68–81

  15. Ji Z, Xu F, Wang B, He B (2012) Question-answer topic model for question retrieval in community question answering. In: Proceedings of the 21st ACM international on conference on information and knowledge management, CIKM ’12, pp 2471–2474

  16. Jiang M, Cui P, Chen X, Wang F, Zhu W, Yang S (2015) Social recommendation with cross-domain transferable knowledge. IEEE transactions on knowledge and data engineering, pp 3084–3097

  17. Kumar V, Pedanekar N (2016) Mining shapes of expertise in online social Q&A communities. In: Proceedings of the 19th ACM conference on computer supported cooperative work and social computing companion, CSCW’16, pp 317–320

  18. Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: Proceedings of the 31st international conference on international conference on machine learning, pp 1188–1196

  19. Li B, King I (2010) Routing questions to appropriate answerers in community question answering services. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10, pp 1585–1588

  20. Li B, King I, Lyu MR (2011) Question routing in community question answering: putting category in its place. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, pp 2041–2044

  21. Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015) Learning entity and relation embeddings for knowledge graph completion. In: AAAI conference on artificial intelligence, AAAI’15, pp 2181–2187

  22. Liu J, Song Y-I, Lin C-Y (2011) Competition-based user expertise score estimation. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’11, pp 425–434

  23. Liu Z, Li K, Qu D (2017) Knowledge graph based question routing for community question answering. In: Neural information processing. Springer, Berlin, pp 721–730

  24. McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, Chikkerur S, Liu D, Wattenberg M, Hrafnkelsson AM, Boulos T, Kubica J (2013) Ad click prediction: a view from the trenches. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13, pp 1222–1230

  25. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th international conference on neural information processing systems—vol 2, NIPS’13, pp 3111–3119

  26. Ramage D, Hall D, Nallapati R, Manning CD (2009) Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora. In: Proceedings of the 2009 conference on empirical methods in natural language processing, EMNLP ’09, pp 248–256

  27. Rendle S (2010) Factorization machines. In: Proceedings of the 2010 IEEE international conference on data mining, ICDM ’10, pp 995–1000

  28. Rendle S (2012) Factorization machines with libfm. ACM Trans Intell Syst Technol 57:1–57

    Article  Google Scholar 

  29. Shen Y, Rong W, Sun Z, Ouyang Y, Xiong Z (2015) Question/answer matching for CQA system via combining lexical and sequential information. In: AAAI conference on artificial intelligence, AAAI’15, pp 275–281

  30. Song H, Ren Z, Liang S, Li P, Ma J, de Rijke M (2017) Summarizing answers in non-factoid community question-answering. In: Proceedings of the tenth ACM international conference on Web Search and Data Mining, WSDM ’17, pp 405–414

  31. Srba I, Bielikova M (2016) A comprehensive survey and classification of approaches for community question answering. ACM Trans Web 18:1–63

    Article  Google Scholar 

  32. Sun J, Ajwani D, Nicholson PK, Sala A, Parthasarathy S (2017) Breaking cycles in noisy hierarchies. In: Proceedings of the 2017 ACM on Web Science conference, WebSci ’17, pp 151–160

  33. Sun J, Moosavi S, Ramnath R, Parthasarathy S (2018) QDEE: question difficulty and expertise estimation in community question answering sites. In: The 12th international AAAI conference On Web and Social Media, ICWSM’18

  34. Sun J, Wang S, Gao BJ, Ma J (2012) Learning to rank for hybrid recommendation. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 2239–2242

  35. Sung J, Lee J-G, Lee U (2013) Booming up the long tails: Discovering potentially contributive users in community-based question answering services. In: International AAAI conference on Web and Social Media, ICWSM’13

  36. Szpektor I, Maarek Y, Pelleg D (2013) When relevance is not enough: promoting diversity and freshness in personalized question recommendation. In: Proceedings of the 22nd international conference on World Wide Web, WWW ’13, pp 1249–1260

  37. Tatti N (2014) Faster way to agony discovering hierarchies in directed graphs. In: Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2014, Nancy, France, September 15–19, 2014. Proceedings, Part III, pp 163–178

  38. Tatti N (2015) Hierarchies in directed networks. In: 2015 IEEE international conference on data mining, pp 991–996

  39. Wang Q, Jing Liu BW, Guo L (2014a) A regularized competition model for question difficulty estimation in community question answering services. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP’14, pp 1115–1126

  40. Wang S, Sun J, Gao BJ, Ma J (2012) Adapting vector space model to ranking-based collaborative filtering. In: Proceedings of the 21st ACM international conference on information and knowledge management, CIKM ’12, pp 1487–1491

  41. Wang S, Sun J, Gao BJ, Ma J (2014b) Vsrank: a novel framework for ranking-based collaborative filtering. ACM Trans Intell Syst Technol 24:1–51

    Google Scholar 

  42. Xu F, Ji Z, Wang B (2012) Dual role model for question recommendation in community question answering. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’12, pp 771–780

  43. Yang B, Manandhar S (2014) Exploring user expertise and descriptive ability in community question answering. In: Proceedings of the 2014 IEEE/ACM international conference on advances in social networks analysis and mining, ASONAM’14, pp 320–327

  44. Yang J, Adamic L, Ackerman M (2008) Competing to share expertise: the Taskcn knowledge sharing community. In: Proceedings of the 2nd international conference on Weblogs and Social Media, pp 161–168

  45. Yang J, Tao K, Bozzon A, Houben G-J (2014) Sparrows and owls: characterisation of expert behaviour in stackoverflow. In: International conference on user modeling, adaptation, and personalization, pp 266–277

  46. Yang L, Qiu M, Gottipati S, Zhu F, Jiang J, Sun H, Chen Z (2013) CQARank: jointly model topics and expertise in community question answering. In: Proceedings of the 22nd ACM international on conference on information and knowledge management, CIKM ’13, pp 99–108

  47. Yu H-F, Hsieh C-J, Si S, Dhillon IS (2012) Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In: Proceedings of the 2012 IEEE 12th international conference on data mining, ICDM’12, pp 765–774

  48. Zhang J, Ackerman MS, Adamic L (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web, WWW ’07, pp 221–230

  49. Zhao T, Bian N, Li C, Li M (2013) Topic-level expert modeling in community question answering. In: Proceedings of the 2013 SIAM international conference on data mining, SDM’13, pp 776–784

  50. Zhao Z, Cheng J, Wei F, Zhou M, Ng W, Wu Y (2014) Socialtransfer: transferring social knowledge for cold-start cowdsourcing. In: Proceedings of the 23rd ACM international conference on information and knowledge management, CIKM ’14, pp 779–788

  51. Zhao Z, Lu H, Zheng VW, Cai D, He X, Zhuang Y (2017) Community-based question answering via asymmetric multi-faceted ranking network learning. In: AAAI conference on artificial intelligence, AAAI’17, pp 3532–3539

  52. Zhao Z, Wei F, Zhou M, Ng W (2015a) Cold-start expert finding in community question answering via graph regularization. In: Database systems for advanced applications, pp 21–38

  53. Zhao Z, Yang Q, Cai D, He X, Zhuang Y (2016) Expert finding for community-based question answering via ranking metric network learning. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI’16, pp 3000–3006

  54. Zhao Z, Zhang L, He X, Ng W (2015b) Expert finding for question answering via graph regularized matrix completion. In: IEEE transactions on knowledge and data engineering, TKDE’15, pp 993–1004

  55. Zhou G, Lai S, Liu K, Zhao J (2012a) Topic-sensitive probabilistic model for expert finding in question answer communities. In: Proceedings of the 21st ACM international on conference on information and knowledge management, CIKM ’12, pp 1662–1666

  56. Zhou G, Liu Y, Liu F, Zeng D, Zhao J (2013) Improving question retrieval in community question answering using world knowledge. In: Proceedings of the twenty-third international joint conference on artificial intelligence, ICJAI’13, pp 2239–2245

  57. Zhou TC, Si X, Chang EY, King I, Lyu MR (2012b) A data-driven approach to question subjectivity identification in community question answering. In: AAAI conference on artificial intelligence, AAAI’12, pp 164–170

  58. Zhu H, Cao H, Xiong H, Chen E, Tian J (2011) Towards expert finding by leveraging relevant categories in authority ranking. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11, pp 2221–2224

  59. Zhu H, Chen E, Xiong H, Cao H, Tian J (2014) Ranking user authority with relevant knowledge categories for expert finding. In: Proceedings of the 23rd international conference on World Wide Web, WWW’14, pp 1081–1107

Download references

Acknowledgements

This work is supported by NSF grants CCF-1645599, IIS-1550302, and CNS-1513120, and a grant from the Ohio Supercomputer Center (PAS0166).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jiankai Sun.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Aniket Chakrabarti: Work done while at The Ohio State University.

Responsible editors Jesse Davis, Elisa Fromont, Derek Greene, Björn Bringmann.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sun, J., Vishnu, A., Chakrabarti, A. et al. ColdRoute: effective routing of cold questions in stack exchange sites. Data Min Knowl Disc 32, 1339–1367 (2018). https://doi.org/10.1007/s10618-018-0577-7

Download citation

Keywords

  • Question routing
  • Expert finding
  • Cold-start problem
  • Question answering services