Efficient question classification and retrieval using category information and word embedding on cQA services
- 12 Downloads
Classifying the task of automatically assigning unlabeled questions into predefined categories (or topics) and effectively retrieving a similar question are crucial aspects of an effective cQA service. We first address the problems associated with estimating and utilizing the distribution of words in each category of word weights. We then apply an automatic expansion word generation technique that is based on our proposed weighting method and the pseudo relevance feedback to question classification. Secondly to address the lexical gap problem in question retrieval, the case frame of the sentence is first defined using the extracted components of a sentence, and a similarity measure based on the case frame and the word embedding is then derived to determine the similarities between two sentences. These similarities are then used to reorder the results of the first retrieval model. Consequently, the proposed methods significantly improve the performance of question classification and retrieval.
KeywordsQuestion classification Word weighting method Category information Pseudo-relevance feedback Question expansion
This work was supported by Institute for Information & communications Technology Planning & Evaluation(IITP) grant funded by the Korea government(MSIT) (No. 2013-2-00131, Development of Knowledge Evolutionary WiseQA Platform Technology for Human Knowledge Augmented Services).
- Bae, K.M., & Ko, T. J. (2014). An effective question expanding method for question classification in cqa services, PIKM ’14: 51–55. https://doi.org/10.1145/2663714.2668050.
- Bernhard, D., & Gurevych, I. (2009). Combining lexical semantic resources with question & answer archives for translation-based answer finding, ACL ’09, pp. 728—736. https://doi.org/10.3115/1690219.1690248.
- Berger, A., & Lafferty, J. (1999). Information retrieval as statistical translation, SIGIR’99, pp. 222–229. https://doi.org/10.1145/312624.312681.
- Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computaional Linguistics, 19(2), 263–311.Google Scholar
- Cai, L., Zhou, G., Liu, K., Zhau, J. (2011). Large-Scal question classification in cQA by leveraging Wikipedia semantic knowledge, CIKM ’11, pp. 1321–1330. https://doi.org/10.1145/2063576.2063768.
- Cao, G., Gao, J., Robertson, S. (2008). Selecting good expansion terms for pseudo-relevance feedback, SIGIR ’08, pp. 243–250. https://doi.org/10.1145/1390334.1390377.
- Cai, L., Zhou, G., Liu, K., Zhao, J. (2012). Learning the latent topics for question retrieval in community QA, ACL’12, pp. 273–281.Google Scholar
- Cao, X., Cong, G., Cui, B., Jensen, C. S., Zhang, C. (2009). The use of categorization information in language models for question retrieval, CIKM’09, pp 265–274. https://doi.org/10.1145/1645953.1645989.
- Cao, X., Cong, G., Cui, B., Jensen, C. S. (2010). A generalized framework of exploring category information for question retrieval in community question answer archives, WWW’10, pp. 201–210. https://doi.org/10.1145/1772690.1772712.
- Duan, H., Cao, Y., Lin, C. Y., Yu, Y. (2008). Searching questions by identifying questions topics and question focus, ACL’08, pp. 156–164.Google Scholar
- Huang, Q., Song, D., Ruger, S. (2008). Robust query-specific pseudo feedback document selection for query expasion, ECIR ’08. LNCS, 4956, 547–554.Google Scholar
- Huang, P., Bu, J. J., Chen, C., Qiu, G. (2007). An effective feature-weighting model for question classification, CIS ’07, pp. 32–36. https://doi.org/10.1109/CIS.2007.12.
- Jiang, H., Li, P., Hu, X., Wang, S. (2009). An improved method of term weighting for text classification, ICIS ’09, pp. 294–298. https://doi.org/10.1109/ICICISYS.2009.5357842.
- Jehl, L., Hieber, F., Riezler, S. (2012). Twitter translation using translation-based cross-lingual retrieval, WMT ’12, pp. 410—421.Google Scholar
- Jeon, J., Croft, W. B., Lee, J. H. (2005). Finding similar questions in large question and answer archives, CIKM ’05, pp. 84—90. https://doi.org/10.1145/1099554.1099572.
- Ji, Z., Xu, F., Wang, B., He, B. (2012). Question retrieval with high quality answers in community question answering, CIKM’12, pp. 2471–2474. https://doi.org/10.1145/2661829.2661908.
- Karimzadehgan, M., & Zhai, C. X. (2010). Estimation of statistical translation models based on mutual information for ad hoc information retrieval, SIGIR’10, pp. 323–330. https://doi.org/10.1145/1835449.1835505.
- Lee, K. S., Croft, W. B., Allan, J. (2008a). A cluster-based resampling method for pseudo-relevance feedback, SIGIR ’08, pp. 235–242. https://doi.org/10.1145/1390334.1390376.
- Lee, Z.S., Maarof, M. A., Selamat, A., Shamsuddin, S. M. (2008b). Enhance term weighting algorithm as feature selection technique for illicit web content classification, ISDA ’08, pp. 145–150. https://doi.org/10.1109/ISDA.2008.171.
- Li, R., & Guo, X. (2010). An improved algorithm to term weighting in text classification, ICMT ’10, pp. 1–3. https://doi.org/10.1109/ICMULT.2010.5630962.
- Loni, B. (2011). A survey of state-of-the-art methods on question classification, (pp. 1–40). Delft University of Technology: Tech. Rep. http://resolver.tudelft.nl/uuid:8e57caa8-04fc-4fe2-b668-20767ab3db92.Google Scholar
- Magdy, W., & Jones, G. J. F. (2011). A study on query expansion methods for patent retrieval, PaIR ’11, pp. 19–24. https://doi.org/10.1145/2064975.2064982.
- Murdock, V., & Croft, W. B. (2005). A statistical model for sentence retrieval, EMNLP ’05, pp. 684–691.Google Scholar
- Ponte, J. M., & Croft, W. B. (1998). A language modeling approach to information retrieval, SIGIR’98, pp. 275–281. https://doi.org/10.1145/290941.291008.
- Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M. (1994). Okapi at trec-3, TREC-3, pp. 109–126.Google Scholar
- Robertson, S.E., & Walker, S. (1999). Okapi/Keenbow at TREC-8. In: TREC-8, pp. 151–161. http://trec.nist.gov/pubs/trec8/papers/okapi.pdf.
- Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA, SIGIR ’10, pp. 411–418. https://doi.org/10.1145/1835449.1835518.
- Sun, R., Ong, C. H., Chua, T. S. (2006). Mining dependency relations for query expansion in passage retrieval, SIGIR ’06, pp. 382–389. https://doi.org/10.1145/1148170.1148237.
- Yang, X., Jones, G. J., Wang, B. (2009). Query dependent pseudo-relevance feedback based on Wikipedia, SIGIR ’09, pp. 59–66. https://doi.org/10.1145/1571941.1571954.
- Yu, S., Cai, D., Wen, J. R., Ma, W. Y. (2003). Improving pseudo-relevance feedback in web information retrieval using web page segmentation, WWW ’03, pp. 11–18. https://doi.org/10.1145/775152.775155.
- Xue, X., & Croft, W. B. (2008). Retrieval models for question and answer archives, SIGIR ’08, pp. 475–482. https://doi.org/10.1145/1390334.1390416.
- Zhang, K., Wu, W., Wu, H., Li, Z., Zhou, M. (2014). Question retrieval with high quality answers in community question answering, CIKM’14, pp. 371–380. https://doi.org/10.1145/2661829.2661908.