Abstract
The growth of online crowdsourcing marketplaces has attracted massive normal buyers and micro workers, even campaigners and malicious users who post spamming jobs. Due to the significant role in information seeking and providing, CQA (Community Question Answering) has become a target of crowdsourcing spammers. In this paper, we aim to develop a solution to detect crowdsourcing spammers in CQA websites. Based on the ground-truth data, we conduct a hybrid analysis including both non-semantic and semantic analysis with a set of unique features (e.g., profile features, social network features, content features and linguistic features). With the help of proposed features, we develop a supervised machine learning solution for detecting crowdsourcing spammers in Community QA. Our method achieves a high performance with an AUC (area under the receiver-operating characteristic curve) value of 0.995 and an \(F_{1}\) score of 0.967, which significantly outperforms existing works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, C., Wu, K., Srinivasan, V., Bharadwaj, K.: The best answers? think twice: online detection of commercial campaigns in the CQA forums. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)
Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: detection of hidden paid posters. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE (2013)
Fxsjy: jieba chinese text segmentation (2016). https://github.com/fxsjy/jieba. Cited 4 May 2016
Gao, R., Hao, B., Li, H., Gao, Y., Zhu, T.: Developing simplified chinese psychological linguistic analysis dictionary for microblog. In: International Conference on Brain and Health Informatics, pp. 359–368. Springer (2013)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explor. Newslett. 11(1), 10–18 (2009). doi:10.1145/1656274.1656278
IBISWorld: Crowdsourcing Service Providers in the US: Market Research Report. Technical report, IBISWorld (2016). http://www.ibisworld.com/industry/crowdsourcing-service-providers.html. Cited 4 May 2016
Lee, K., Caverlee, J., Cheng, Z., Sui, D.Z.: Campaign extraction from social media. ACM Trans. Intell. Syst. Technol. 5(1), 1–28 (2013). doi:10.1145/2542182.2542191
Lee, K., Tamilarasan, P., Caverlee, J.: Crowdturfers, campaigns, and social media: tracking and revealing crowdsourced manipulation of social media. In: Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM (2013)
Lee, K., Webb, S., Ge, H.: The dark side of micro-task marketplaces: characterizing fiverr and automatically detecting crowdturfing. In: Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM (2014)
Li, X., Liu, Y., Zhang, M., Ma, S., Zhu, X., Sun, J.: Detecting promotion campaigns in community question answering. In: 24th International Joint Conference on Artificial Intelligence-IJCAI-15 (2014)
Wang, G., Wilson, C., Zhao, X., Zhu, Y., Mohanlal, M., Zheng, H., Zhao, B.Y.: Serf and turf. In: WWW 2012 (2012)
Wang, G., Wang, T., Zheng, H., Zhao, B.Y.: Man vs. machine: Lractical adversarial detection of malicious crowdsourcing workers. In: 23rd USENIX Security Symposium, USENIX Association, CA (2014)
Xu, A.: Revealing, characterizing, and detecting crowdsourcing spammers: a case study in community Q & A. In: IEEE INFOCOM 2015 (2015)
Yang, C., Harkreader, R., Zhang, J., Shin, S., Gu, G.: Analyzing spammers’ social networks for fun and profit: a case study of cyber criminal ecosystem on twitter. In: WWW 2012 (2012)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning ICML (1997)
ZBJ: ZBJ.com. (2016). http://www.zbj.com/. Cited 15 Mar 2016
Zeng, J.: Lu Chuan respond to the “Shuijun” event (2012). http://ent.qq.com/a/20121204/000350.htm. Cited 5 May 2016
Zhidao: Baidu Zhidao (2016). http://zhidao.baidu.com/. Cited 1 Jan 2016
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Hao, K., Wang, L. (2018). Detecting Crowdsourcing Spammers in Community Question Answering Websites. In: Barolli, L., Zhang, M., Wang, X. (eds) Advances in Internetworking, Data & Web Technologies. EIDWT 2017. Lecture Notes on Data Engineering and Communications Technologies, vol 6. Springer, Cham. https://doi.org/10.1007/978-3-319-59463-7_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-59463-7_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59462-0
Online ISBN: 978-3-319-59463-7
eBook Packages: EngineeringEngineering (R0)