Skip to main content

Detecting Crowdsourcing Spammers in Community Question Answering Websites

  • Conference paper
  • First Online:
Book cover Advances in Internetworking, Data & Web Technologies (EIDWT 2017)

Abstract

The growth of online crowdsourcing marketplaces has attracted massive normal buyers and micro workers, even campaigners and malicious users who post spamming jobs. Due to the significant role in information seeking and providing, CQA (Community Question Answering) has become a target of crowdsourcing spammers. In this paper, we aim to develop a solution to detect crowdsourcing spammers in CQA websites. Based on the ground-truth data, we conduct a hybrid analysis including both non-semantic and semantic analysis with a set of unique features (e.g., profile features, social network features, content features and linguistic features). With the help of proposed features, we develop a supervised machine learning solution for detecting crowdsourcing spammers in Community QA. Our method achieves a high performance with an AUC (area under the receiver-operating characteristic curve) value of 0.995 and an \(F_{1}\) score of 0.967, which significantly outperforms existing works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  2. Chen, C., Wu, K., Srinivasan, V., Bharadwaj, K.: The best answers? think twice: online detection of commercial campaigns in the CQA forums. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)

    Google Scholar 

  3. Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)

    Google Scholar 

  4. Fxsjy: jieba chinese text segmentation (2016). https://github.com/fxsjy/jieba. Cited 4 May 2016

  5. Gao, R., Hao, B., Li, H., Gao, Y., Zhu, T.: Developing simplified chinese psychological linguistic analysis dictionary for microblog. In: International Conference on Brain and Health Informatics, pp. 359–368. Springer (2013)

    Google Scholar 

  6. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278

    Article  Google Scholar 

  7. IBISWorld: Crowdsourcing Service Providers in the US: Market Research Report. Technical report, IBISWorld (2016). http://www.ibisworld.com/industry/crowdsourcing-service-providers.html. Cited 4 May 2016

  8. Lee, K., Caverlee, J., Cheng, Z., Sui, D.Z.: Campaign extraction from social media. ACM Trans. Intell. Syst. Technol. 5(1), 1–28 (2013). doi:10.1145/2542182.2542191

    Article  Google Scholar 

  9. Lee, K., Tamilarasan, P., Caverlee, J.: Crowdturfers, campaigns, and social media: tracking and revealing crowdsourced manipulation of social media. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM (2013)

    Google Scholar 

  10. Lee, K., Webb, S., Ge, H.: The dark side of micro-task marketplaces: characterizing fiverr and automatically detecting crowdturfing. In: Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM (2014)

    Google Scholar 

  11. Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: 24th International Joint Conference on Artificial Intelligence-IJCAI-15 (2014)

    Google Scholar 

  12. Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf. In: WWW 2012 (2012)

    Google Scholar 

  13. Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Lractical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA (2014)

    Google Scholar 

  14. Xu, A.: Revealing, characterizing, and detecting crowdsourcing spammers: a case study in community Q & A. In: IEEE INFOCOM 2015 (2015)

    Google Scholar 

  15. Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: WWW 2012 (2012)

    Google Scholar 

  16. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning ICML (1997)

    Google Scholar 

  17. ZBJ: ZBJ.com. (2016). http://www.zbj.com/. Cited 15 Mar 2016

  18. Zeng, J.: Lu Chuan respond to the “Shuijun” event (2012). http://ent.qq.com/a/20121204/000350.htm. Cited 5 May 2016

  19. Zhidao: Baidu Zhidao (2016). http://zhidao.baidu.com/. Cited 1 Jan 2016

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Hao, K., Wang, L. (2018). Detecting Crowdsourcing Spammers in Community Question Answering Websites. In: Barolli, L., Zhang, M., Wang, X. (eds) Advances in Internetworking, Data & Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-59463-7_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59463-7_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59462-0

  • Online ISBN: 978-3-319-59463-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics