Detecting Crowdsourcing Spammers in Community Question Answering Websites

Hao, Kaiqing; Wang, Lei

doi:10.1007/978-3-319-59463-7_41

Kaiqing Hao⁵ &
Lei Wang⁵

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 6))

Included in the following conference series:

International Conference on Emerging Internetworking, Data & Web Technologies

1406 Accesses
1 Citations

Abstract

The growth of online crowdsourcing marketplaces has attracted massive normal buyers and micro workers, even campaigners and malicious users who post spamming jobs. Due to the significant role in information seeking and providing, CQA (Community Question Answering) has become a target of crowdsourcing spammers. In this paper, we aim to develop a solution to detect crowdsourcing spammers in CQA websites. Based on the ground-truth data, we conduct a hybrid analysis including both non-semantic and semantic analysis with a set of unique features (e.g., profile features, social network features, content features and linguistic features). With the help of proposed features, we develop a supervised machine learning solution for detecting crowdsourcing spammers in Community QA. Our method achieves a high performance with an AUC (area under the receiver-operating characteristic curve) value of 0.995 and an \(F_{1}\) score of 0.967, which significantly outperforms existing works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chen, C., Wu, K., Srinivasan, V., Bharadwaj, K.: The best answers? think twice: online detection of commercial campaigns in the CQA forums. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)
Google Scholar
Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)
Google Scholar
Fxsjy: jieba chinese text segmentation (2016). https://github.com/fxsjy/jieba. Cited 4 May 2016
Gao, R., Hao, B., Li, H., Gao, Y., Zhu, T.: Developing simplified chinese psychological linguistic analysis dictionary for microblog. In: International Conference on Brain and Health Informatics, pp. 359–368. Springer (2013)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
Article Google Scholar
IBISWorld: Crowdsourcing Service Providers in the US: Market Research Report. Technical report, IBISWorld (2016). http://www.ibisworld.com/industry/crowdsourcing-service-providers.html. Cited 4 May 2016
Lee, K., Caverlee, J., Cheng, Z., Sui, D.Z.: Campaign extraction from social media. ACM Trans. Intell. Syst. Technol. 5(1), 1–28 (2013). doi:10.1145/2542182.2542191
Article Google Scholar
Lee, K., Tamilarasan, P., Caverlee, J.: Crowdturfers, campaigns, and social media: tracking and revealing crowdsourced manipulation of social media. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM (2013)
Google Scholar
Lee, K., Webb, S., Ge, H.: The dark side of micro-task marketplaces: characterizing fiverr and automatically detecting crowdturfing. In: Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM (2014)
Google Scholar
Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: 24th International Joint Conference on Artificial Intelligence-IJCAI-15 (2014)
Google Scholar
Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf. In: WWW 2012 (2012)
Google Scholar
Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Lractical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA (2014)
Google Scholar
Xu, A.: Revealing, characterizing, and detecting crowdsourcing spammers: a case study in community Q & A. In: IEEE INFOCOM 2015 (2015)
Google Scholar
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: WWW 2012 (2012)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning ICML (1997)
Google Scholar
ZBJ: ZBJ.com. (2016). http://www.zbj.com/. Cited 15 Mar 2016
Zeng, J.: Lu Chuan respond to the “Shuijun” event (2012). http://ent.qq.com/a/20121204/000350.htm. Cited 5 May 2016
Zhidao: Baidu Zhidao (2016). http://zhidao.baidu.com/. Cited 1 Jan 2016

Download references

Author information

Authors and Affiliations

Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, School of Software, Dalian University of Technology, Dalian, China
Kaiqing Hao & Lei Wang

Authors

Kaiqing Hao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Wang .

Editor information

Editors and Affiliations

Fukuoka Institute of Technology, Fukuoka, Japan
Leonard Barolli
School of Computer Sciences, Hubei University of Technology, Wuhan, China
Mingwu Zhang
Department of Electronic Technology, Key, Engineering University of CAPF, Xi’an, Xizang, China
Xu An Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hao, K., Wang, L. (2018). Detecting Crowdsourcing Spammers in Community Question Answering Websites. In: Barolli, L., Zhang, M., Wang, X. (eds) Advances in Internetworking, Data & Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-59463-7_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-59463-7_41
Published: 28 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59462-0
Online ISBN: 978-3-319-59463-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics