Abstract
Due to the involvement of a huge amount of expenditure, the Pay-per-Click digital advertising model is affected by click fraud. The rare presence of illegitimate publishers than the legitimates makes the class distribution skewed which in turn biases the learning model towards the majority class. Moreover, the dynamic disguise activities of publishers make the task harder to investigate the actual status labels of the publishers, resulting in less accurate predictions. However, features reported in the literature detect click fraud based on analyzing the publishers’ conduct but are unable to cope with the changing behavior of publishers. Therefore, we proposed a framework addressing two challenging issues: a) eight new conditional features are proposed to better capture the dynamics of click fraud behavior by merging two or more attributes over publishers on finer-grained time intervals. b) Examined the impact of data sampling methods on learners’ performance for the effective identification of publisher’s activity. The performance of the classification algorithms is assessed using average precision, recall, and f-measure and validated using 10-fold cross-validation. Experimental results illustrate the potential of the proposed features over a balanced dataset in improving the average precision of the learners in discriminating the fraudsters among genuine publishers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sisodia, D., Sisodia, D.S.: Gradient boosting learning for fraudulent publisher detection in online advertising. Data Technol. Appl. 55, 216–232 (2020). https://doi.org/10.1108/DTA-04-2020-0093
Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15, 99–140 (2014). https://doi.org/10.1145/2623330.2623718
Haider, C.M.R., Iqbal, A., Rahman, A.H., Rahman, M.S.: An ensemble learning based approach for impression fraud detection in mobile advertising. J. Netw. Comput. Appl. 112, 126–141 (2018). https://doi.org/10.1016/j.jnca.2018.02.021
Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, Miami Florida, pp. 105–116 (2019)
Sisodia, D., Sisodia, D.S.: Data sampling strategies for click fraud detection using imbalanced user click data of online advertising : an empirical review. IETE Tech. Rev. (2021). https://doi.org/10.1080/02564602.2021.1915892
Almeida, P.S., Gondim, J.J.C.: Click fraud detection and prevention system for ad networks. J. Inf. Secur. Crytography 5, 27–40 (2018)
Tripathi, D., Nigam, B., Edla, D.R.: A novel web fraud detection technique using association rule mining. Procedia Comput. Sci. 115, 274–281 (2017). https://doi.org/10.1016/j.procs.2017.09.135
Skersys, T., Butleris, R., Butkiene, R. (eds.): ICIST 2012. CCIS, vol. 319. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33308-8
Mouawi, R., Awad, M., Chehab, A., El Hajj, I.H., Kayssi, A.: Towards a machine learning approach for detecting click fraud in mobile advertizing. In: Proceedings of the 13th International Conference on Innovations in Information Technology, pp. 88–92. IIT, Al Ain, UAE. IEEE (2019). https://doi.org/10.1109/INNOVATIONS.2018.8605973
Haddadi, H.: Fighting online click-fraud using bluff ads. ACM SIGCOMM Comput. Commun. Rev. 40, 21–25 (2010)
Walgampaya, C., Kantardzic, M., Yampolskiy, R.: Evidence fusion for real time click fraud detection and prevention. In: Ao, S.-I., Amouzegar, M., Rieger, B.B. (eds.) Intelligent Automation and Systems Engineering, pp. 1–14. Springer New York, New York, NY (2011). https://doi.org/10.1007/978-1-4614-0373-9_1
Antoniou, D., et al.: Exposing click-fraud using a burst detection algorithm. In: Proceedings - IEEE Symposium on Computers and Communications. pp. 1111–1116. IEEE (2011). https://doi.org/10.1109/ISCC.2011.5983854
Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38
Miralles-Pechuán, L., Ponce, H., Martínez-Villaseñor, L.: A novel methodology for optimizing display advertising campaigns using genetic algorithms. Electron. Commer. Res. Appl. 27, 39–51 (2018). https://doi.org/10.1016/j.elerap.2017.11.004
Berrar, D.: Learning from automatically labeled data: case study on click fraud prediction. Knowl. Inf. Syst. 46(2), 477–490 (2015). https://doi.org/10.1007/s10115-015-0827-6
Last, F., Douzas, G., Bacao, F.: Oversampling for Imbalanced Learning Based on K-Means and SMOTE. arXiv Prepr. arXiv1711.00837 (2017). https://doi.org/10.1533/9780857096166
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4 (2011). https://doi.org/10.1504/IJKESDP.2011.039875
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng. Sci. Technol. Int. J. 28, 101011 (2022). https://doi.org/10.1016/j.jestch.2021.05.015
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122
Sutton, C.D.: Classification and Regression Trees, Bagging, and Boosting. Elsevier (2004). https://doi.org/10.1016/S0169-7161(04)24011-1
Sisodia, D., Shrivastava, S.K., Jain, R.C.: ISVM for face recognition. In: International Conference on Computational Intelligence and Communication Networks, (CICN), pp. 554–559 (2010). https://doi.org/10.1109/CICN.2010.109
Sisodia, D., Singh, L., Sisodia, S.: Fast and accurate face recognition using SVM and DCT. In: Babu, B.V., et al. (eds.) Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. AISC, vol. 236, pp. 1027–1038. Springer, New Delhi (2014). https://doi.org/10.1007/978-81-322-1602-5_108
Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. 17(1), 168–192 (2020). https://doi.org/10.1016/j.aci.2018.08.003
Berrar, D.: Random forests for the detection of click fraud in online mobile advertising. In: Proceedings of 2012 International Workshop on Fraud Detection in Mobile Advertising (FDMA), Singapore, pp. 1–10 (2012)
Vasumati, D., Vani, M.S., Bhramaramba, R., Babu, O.Y.: Data mining approach to filter click-spam in mobile ad networks. In: Int’l Conference on Computer Science. Data Mining & Mechanical Engg, pp. 90–94. ICCDMME Bangkok, Thailand (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sisodia, D., Sisodia, D.S. (2023). Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising. In: Garg, L., et al. Information Systems and Management Science. ISMS 2021. Lecture Notes in Networks and Systems, vol 521. Springer, Cham. https://doi.org/10.1007/978-3-031-13150-9_34
Download citation
DOI: https://doi.org/10.1007/978-3-031-13150-9_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13149-3
Online ISBN: 978-3-031-13150-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)