Classifying Forum Questions Using PCA and Machine Learning for Improving Online CQA

Fong, Simon; Zhuang, Yan; Liu, Kexing; Zhou, Shu

doi:10.1007/978-981-287-936-3_2

Simon Fong¹³,
Yan Zhuang¹³,
Kexing Liu¹³ &
…
Shu Zhou¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 545))

Included in the following conference series:

International Conference on Soft Computing in Data Science

1216 Accesses
5 Citations

Abstract

As one of the most popular e-Business models, community question answering (CQA) services increasingly gather large amount of knowledge through the voluntary services of the online community across the globe. While most questions in CQA usually receive an answer posted by the peer users, it is found that the number of unanswered or ignored questions soared up high in the past few years. Understanding the factors that contribute to questions being answered as well as questions remain ignored can help the forum users to improve the quality of their questions and increase their chances of getting answers from the forum. In this study, feature selection method called Principal Component Analysis was used to extract the factors or components of the features. Then data mining techniques was used to identify the relevant features that will help predict the quality of questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 775–782. ACM, April 2012
Google Scholar
Chen, L., Zhang, D., Mark, L.: Understanding user intent in community question answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 823–828, April 2012
Google Scholar
Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 850–858 (2012)
Google Scholar
Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 1–36 (2012)
Google Scholar
Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2857–2866, May 2011
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, E.: Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, July 2004
Google Scholar
Ratanamahatana, C.A., Gunopulos, D.: Scaling up the naive bayesian classifier: using decision trees for feature selection. In: Proc. Workshop Data Cleaning and Preprocessing (DCAP 2002), at IEEE Int’l Conf. Data Mining, ICDM 2002 (2002)
Google Scholar
Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. Advances in Neural Information Processing Systems 18, 1473 (2006)
Google Scholar
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM, June 2006
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Information Science, University of Macau, Macau SAR, China
Simon Fong, Yan Zhuang & Kexing Liu
Department of Product Marketing, MOZAT Pte Ltd, Singapore, Singapore
Shu Zhou

Authors

Simon Fong
View author publications
You can also search for this author in PubMed Google Scholar
Yan Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Kexing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shu Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Simon Fong .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, Tennessee, USA
Michael W. Berry
Universiti Teknologi MARA, Shah Alam, Malaysia
Azlinah Mohamed
Universiti Teknologi MARA, Shah Alam, Malaysia
Bee Wah Yap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fong, S., Zhuang, Y., Liu, K., Zhou, S. (2015). Classifying Forum Questions Using PCA and Machine Learning for Improving Online CQA. In: Berry, M., Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2015. Communications in Computer and Information Science, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-287-936-3_2

Download citation

DOI: https://doi.org/10.1007/978-981-287-936-3_2
Published: 12 November 2015
Publisher Name: Springer, Singapore
Print ISBN: 978-981-287-935-6
Online ISBN: 978-981-287-936-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics