Skip to main content
Log in

A novel machine learning approach to rank web forum posts

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Since the user generated contents in Web forums are rich but vary in quality, ranging from excellent detailed opinions to simple repetition of the content of previous, or even spams, it is difficult to find high quality information in the process of post browsing, retrieval and other Web forum applications. In this paper, we propose a novel machine learning approach named LGPRank to evaluate the web forum posts, where a genetic programming architecture is used to rank Web forum posts according to the qualities of their contents. In order to address the shortcomings of current studies, we take both the semantic-free and semantic-specific information of a post into account. We propose a set of new features named Latent Dirichlet Allocation (LDA) semantic features which are computed in LDA topic space. The proposed features as well as content surface features and forum specific features are used in the learning process. Experiments are conducted on three web forum datasets in comparison with methods used in prior ranking research. LGPRank outperforms all the other methods in terms of P@N, NDCG@N and MAP measures. Furthermore, the experimental results also indicate that the proposed LDA semantic features have a positive effect in improving the ranking performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://www.smth.edu.cn/.

  2. http://groups.google.com.

  3. http://www.nabble.com.

  4. http://discussions.apple.com.

  5. http://ubuntuforums.org.

  6. http://club.china.com//data/threads/1638757/.

References

  • Agarwal A, Raghavan H, Subbian K, Melville P, Lawrence RD, Gondek DC, Fan J (2012) Learning to rank for robust question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management, ACM, New York, CIKM ’12, pp 833–842

  • Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley, Menlo Park

    Google Scholar 

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Burges C, Shaked T, Renshaw E, Lazier A, Deeds M, Hamilton N, Hullender G (2005) Learning to rank using gradient descent. In: Proceedings of the 22nd international conference on, Machine learning, pp 89–96

  • Chen CC, Tseng YD (2011) Quality evaluation of product reviews using an information quality framework. Decision Support Syst 50(4):755–768

    Article  Google Scholar 

  • Chen L, Nayak R (2012) Leveraging the network information for evaluating answer quality in a collaborative question answering portal. Social Netw Anal Mining 2(3):197–215

    Article  Google Scholar 

  • Chen Z, Zhang L, Wang W (2008) Postingrank: bringing order to web forum postings. In: Li H, Liu T, Ma WY, Sakai T, Wong KF, Zhou G (eds) Information retrieval technology. Lecture notes in computer science, vol 4993. Springer, Berlin, pp 377–384

  • Fan W, Gordon MD, Pathak P (2004) Discovery of context-specific ranking functions for effective information retrieval using genetic programming. IEEE Trans Knowl Data Eng 16(4):523–527

    Article  Google Scholar 

  • Freund Y, Iyer R, Schapire RE, Singer Y (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969

    MathSciNet  Google Scholar 

  • Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng 23(10):1498–1512

    Article  Google Scholar 

  • Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(Suppl 1):5228–5235

    Google Scholar 

  • Herbrich R, Graepel T, Obermayer K (2000) Large margin rank boundaries for ordinal regression. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D (eds) Advances in large margin classifiers. MIT Press, Cambridge, pp 115–132

    Google Scholar 

  • Hong Y, Lu J, Yao J, Zhu Q, Zhou G (2012) What reviews are satisfactory: novel features for automatic helpfulness voting. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’12. ACM, New York, pp 495–504

  • Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of ir techniques. ACM Trans Inf Syst 20(4):422–446

    Article  Google Scholar 

  • John B, Chua A, Goh DHL (2011) What makes a high-quality user-generated answer? IEEE Internet Computing 15(1):66–71

    Article  Google Scholar 

  • Koza J Jr (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge

    MATH  Google Scholar 

  • Landis JRKG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174

    Article  MATH  MathSciNet  Google Scholar 

  • Lau RYK, Liao SY, Kwok RCW, Xu K, Xia Y, Li Y (2012) Text mining and probabilistic language modeling for online review spam detection. Inf Syst ACM Trans Manage 2(4):25:1–25:30

    Google Scholar 

  • Li YM, Liao TF, Lai CY (2012) A social recommender mechanism for improving knowledge sharing in online forums. Inf Process Manage 48(5):978–994

    Article  MATH  Google Scholar 

  • Lin C, Yang JM, Cai R, Wang XJ, Wang W (2009) Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications. In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in, information retrieval, pp 131–138

  • Lin JY, Ke HR, Chien BC, Yang WP (2007) Designing a classifier by a layered multi-population genetic programming approach. Pattern Recogn 40(8):2211–2225

    Article  MATH  Google Scholar 

  • Liu W, Yan H, Xiao J (2011) Automatically extracting user reviews from forum sites. Comput Math Appl 62(7):2779–2792

    Google Scholar 

  • Liu Y, Huang X, An A, Yu X (2008) Modeling and predicting the helpfulness of online reviews. In: Proceedings of the 2008 eighth IEEE international conference on data mining. IEEE Computer Society, Washington, DC, ICDM ’08, pp 443–452

  • Liu Y, Jin J, Ji P, Harding JA, Fung RY (2013) Identifying helpful online reviews: a product designers perspective. Computer-Aided Design 45(2):180–194

    Google Scholar 

  • Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. Proceedings of the eighteenth conference on uncertainty in, artificial intelligence, pp 352–359

  • OMahony M, Smyth B (2010) A classification-based review recommender. Knowl Based Syst 23(4):323–329

    Google Scholar 

  • Petrovi A, Vehovar V (2012) Posting, quoting, and replying: a comparison of methodological approaches to measure communication ties in web forums. Quality Quantity 46(3):829–854

    Google Scholar 

  • Phan XH, Nguyen CT, Le DT, Nguyen LM, Horiguchi S, Ha QT (2011) A hidden topic-based framework toward building applications with short web documents. IEEE Trans Knowl Data Eng 23(7):961–976

    Article  Google Scholar 

  • Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of the 46th annual meeting for the association for computational linguistics: human language technologies (ACL-08: HLT), pp 719–727

  • Tsur O, Rappoport A (2009) Revrank: A fully unsupervised algorithm for selecting the most helpful book reviews. In: Adar E, Hurst M, Finin T, Glance NS, Nicolov N, Tseng BL (eds) Proceedings of the third international ICWSM Conference, ICWSM’09, pp 154–161

  • Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst, Technol 3(4):61:1–61:21

    Google Scholar 

  • Weimer M, Gurevych I (2007) Predicting the perceived quality of web forum posts. In: Proceedings of RANLP

  • Xi W, Lind J, Brill E (2004) Learning effective ranking functions for newsgroup search. In: Proceedings of the 27th annual international ACM SIGIR conference on research and development in, information retrieval, pp 394–401

  • Xu G, Ma WY (2006) Building implicit links from content for forum search. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in, information retrieval, pp 300–307

  • Xu J, Li H (2007) Adarank: a boosting algorithm for information retrieval. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in, information retrieval, pp 391–398

  • Yeh JY, Lin JY, Ke HR, Yang WP (2007) Learning to rank for information retrieval using genetic programming. In: Joachims T, Li H, Liu TY, Zhai C (eds) SIGIR 2007 workshop: learning to rank for information retrieval

  • Zhang R, Tran T, Mao Y (2012) Opinion helpfulness prediction in the presence of words of few mouths. World Wide Web 15(2):117–138

    Article  Google Scholar 

  • Zhang Y, Dang Y, Chen H (2011) Gender classification for web forums. IEEE Trans Syst Man Cybern Part A 668–677

  • Zhang Z (2008) Weighing stars: Aggregating online product reviews for intelligent e-commerce applications. Intell Syst IEEE 23(5):42–49

    Article  Google Scholar 

  • Zheng X, Hu Z, Xu A, Chen D, Liu K, Li B (2012) Algorithm for recommending answer providers in community-based question answering. J Inf Sci 38(1):3–14

    Article  Google Scholar 

Download references

Acknowledgments

The authors acknowledge that this work is supported by the NSF of China (60970047, 61103151, 61173068), the NSF of Shandong province (ZR2012FM037), and the Doctoral Fund of Ministry of Education of China (20110131110028).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Ma.

Additional information

Communicated by E. Viedma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, X., Ma, J., Wu, Y. et al. A novel machine learning approach to rank web forum posts. Soft Comput 18, 941–959 (2014). https://doi.org/10.1007/s00500-013-1113-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-013-1113-8

Keywords

Navigation