Skip to main content

Classifying Forum Questions Using PCA and Machine Learning for Improving Online CQA

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 545))

Included in the following conference series:

Abstract

As one of the most popular e-Business models, community question answering (CQA) services increasingly gather large amount of knowledge through the voluntary services of the online community across the globe. While most questions in CQA usually receive an answer posted by the peer users, it is found that the number of unanswered or ignored questions soared up high in the past few years. Understanding the factors that contribute to questions being answered as well as questions remain ignored can help the forum users to improve the quality of their questions and increase their chances of getting answers from the forum. In this study, feature selection method called Principal Component Analysis was used to extract the factors or components of the features. Then data mining techniques was used to identify the relevant features that will help predict the quality of questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 775–782. ACM, April 2012

    Google Scholar 

  2. Chen, L., Zhang, D., Mark, L.: Understanding user intent in community question answering. In: Proceedings of the 21st International Conference Companion on World Wide Web, pp. 823–828, April 2012

    Google Scholar 

  3. Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 850–858 (2012)

    Google Scholar 

  4. Barua, A., Thomas, S.W., Hassan, A.E.: What are developers talking about? An analysis of topics and trends in stack overflow. Empirical Software Engineering, 1–36 (2012)

    Google Scholar 

  5. Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2857–2866, May 2011

    Google Scholar 

  6. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, E.: Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning. ACM, July 2004

    Google Scholar 

  8. Ratanamahatana, C.A., Gunopulos, D.: Scaling up the naive bayesian classifier: using decision trees for feature selection. In: Proc. Workshop Data Cleaning and Preprocessing (DCAP 2002), at IEEE Int’l Conf. Data Mining, ICDM 2002 (2002)

    Google Scholar 

  9. Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. Advances in Neural Information Processing Systems 18, 1473 (2006)

    Google Scholar 

  10. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 161–168. ACM, June 2006

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Fong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media Singapore

About this paper

Cite this paper

Fong, S., Zhuang, Y., Liu, K., Zhou, S. (2015). Classifying Forum Questions Using PCA and Machine Learning for Improving Online CQA. In: Berry, M., Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2015. Communications in Computer and Information Science, vol 545. Springer, Singapore. https://doi.org/10.1007/978-981-287-936-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-287-936-3_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-287-935-6

  • Online ISBN: 978-981-287-936-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics