Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising

Sisodia, Deepti; Sisodia, Dilip Singh

doi:10.1007/978-3-031-13150-9_34

Deepti Sisodia¹⁷ &
Dilip Singh Sisodia¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 521))

Included in the following conference series:

International Conference on Information Systems and Management Science

379 Accesses
3 Citations

Abstract

Due to the involvement of a huge amount of expenditure, the Pay-per-Click digital advertising model is affected by click fraud. The rare presence of illegitimate publishers than the legitimates makes the class distribution skewed which in turn biases the learning model towards the majority class. Moreover, the dynamic disguise activities of publishers make the task harder to investigate the actual status labels of the publishers, resulting in less accurate predictions. However, features reported in the literature detect click fraud based on analyzing the publishers’ conduct but are unable to cope with the changing behavior of publishers. Therefore, we proposed a framework addressing two challenging issues: a) eight new conditional features are proposed to better capture the dynamics of click fraud behavior by merging two or more attributes over publishers on finer-grained time intervals. b) Examined the impact of data sampling methods on learners’ performance for the effective identification of publisher’s activity. The performance of the classification algorithms is assessed using average precision, recall, and f-measure and validated using 10-fold cross-validation. Experimental results illustrate the potential of the proposed features over a balanced dataset in improving the average precision of the learners in discriminating the fraudsters among genuine publishers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sisodia, D., Sisodia, D.S.: Gradient boosting learning for fraudulent publisher detection in online advertising. Data Technol. Appl. 55, 216–232 (2020). https://doi.org/10.1108/DTA-04-2020-0093
Article Google Scholar
Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15, 99–140 (2014). https://doi.org/10.1145/2623330.2623718
Article MathSciNet Google Scholar
Haider, C.M.R., Iqbal, A., Rahman, A.H., Rahman, M.S.: An ensemble learning based approach for impression fraud detection in mobile advertising. J. Netw. Comput. Appl. 112, 126–141 (2018). https://doi.org/10.1016/j.jnca.2018.02.021
Article Google Scholar
Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, Miami Florida, pp. 105–116 (2019)
Google Scholar
Sisodia, D., Sisodia, D.S.: Data sampling strategies for click fraud detection using imbalanced user click data of online advertising : an empirical review. IETE Tech. Rev. (2021). https://doi.org/10.1080/02564602.2021.1915892
Almeida, P.S., Gondim, J.J.C.: Click fraud detection and prevention system for ad networks. J. Inf. Secur. Crytography 5, 27–40 (2018)
Article Google Scholar
Tripathi, D., Nigam, B., Edla, D.R.: A novel web fraud detection technique using association rule mining. Procedia Comput. Sci. 115, 274–281 (2017). https://doi.org/10.1016/j.procs.2017.09.135
Article Google Scholar
Skersys, T., Butleris, R., Butkiene, R. (eds.): ICIST 2012. CCIS, vol. 319. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33308-8
Book Google Scholar
Mouawi, R., Awad, M., Chehab, A., El Hajj, I.H., Kayssi, A.: Towards a machine learning approach for detecting click fraud in mobile advertizing. In: Proceedings of the 13th International Conference on Innovations in Information Technology, pp. 88–92. IIT, Al Ain, UAE. IEEE (2019). https://doi.org/10.1109/INNOVATIONS.2018.8605973
Haddadi, H.: Fighting online click-fraud using bluff ads. ACM SIGCOMM Comput. Commun. Rev. 40, 21–25 (2010)
Article Google Scholar
Walgampaya, C., Kantardzic, M., Yampolskiy, R.: Evidence fusion for real time click fraud detection and prevention. In: Ao, S.-I., Amouzegar, M., Rieger, B.B. (eds.) Intelligent Automation and Systems Engineering, pp. 1–14. Springer New York, New York, NY (2011). https://doi.org/10.1007/978-1-4614-0373-9_1
Chapter Google Scholar
Antoniou, D., et al.: Exposing click-fraud using a burst detection algorithm. In: Proceedings - IEEE Symposium on Computers and Communications. pp. 1111–1116. IEEE (2011). https://doi.org/10.1109/ISCC.2011.5983854
Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38
Chapter Google Scholar
Miralles-Pechuán, L., Ponce, H., Martínez-Villaseñor, L.: A novel methodology for optimizing display advertising campaigns using genetic algorithms. Electron. Commer. Res. Appl. 27, 39–51 (2018). https://doi.org/10.1016/j.elerap.2017.11.004
Article Google Scholar
Berrar, D.: Learning from automatically labeled data: case study on click fraud prediction. Knowl. Inf. Syst. 46(2), 477–490 (2015). https://doi.org/10.1007/s10115-015-0827-6
Article Google Scholar
Last, F., Douzas, G., Bacao, F.: Oversampling for Imbalanced Learning Based on K-Means and SMOTE. arXiv Prepr. arXiv1711.00837 (2017). https://doi.org/10.1533/9780857096166
Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4 (2011). https://doi.org/10.1504/IJKESDP.2011.039875
Article Google Scholar
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Chapter Google Scholar
Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng. Sci. Technol. Int. J. 28, 101011 (2022). https://doi.org/10.1016/j.jestch.2021.05.015
Article Google Scholar
Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122
Article Google Scholar
Sutton, C.D.: Classification and Regression Trees, Bagging, and Boosting. Elsevier (2004). https://doi.org/10.1016/S0169-7161(04)24011-1
Article Google Scholar
Sisodia, D., Shrivastava, S.K., Jain, R.C.: ISVM for face recognition. In: International Conference on Computational Intelligence and Communication Networks, (CICN), pp. 554–559 (2010). https://doi.org/10.1109/CICN.2010.109
Sisodia, D., Singh, L., Sisodia, S.: Fast and accurate face recognition using SVM and DCT. In: Babu, B.V., et al. (eds.) Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. AISC, vol. 236, pp. 1027–1038. Springer, New Delhi (2014). https://doi.org/10.1007/978-81-322-1602-5_108
Chapter Google Scholar
Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. 17(1), 168–192 (2020). https://doi.org/10.1016/j.aci.2018.08.003
Article Google Scholar
Berrar, D.: Random forests for the detection of click fraud in online mobile advertising. In: Proceedings of 2012 International Workshop on Fraud Detection in Mobile Advertising (FDMA), Singapore, pp. 1–10 (2012)
Google Scholar
Vasumati, D., Vani, M.S., Bhramaramba, R., Babu, O.Y.: Data mining approach to filter click-spam in mobile ad networks. In: Int’l Conference on Computer Science. Data Mining & Mechanical Engg, pp. 90–94. ICCDMME Bangkok, Thailand (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology, Raipur, India
Deepti Sisodia & Dilip Singh Sisodia

Authors

Deepti Sisodia
View author publications
You can also search for this author in PubMed Google Scholar
Dilip Singh Sisodia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deepti Sisodia .

Editor information

Editors and Affiliations

Department of Computer Information Systems, University of Malta, Msida, Malta
Lalit Garg
Computer Science and Engineering, National Institute of Technology, Chhatisgarh, India
Dilip Singh Sisodia
Central University of Rajasthan, Ajmer, Rajasthan, India
Nishtha Kesswani
Computer Information Systems Department, University of Malta, Msida, Malta
Joseph G Vella
EMLYON Business School, Écully, France
Imene Brigui
Faculty of Information and Communication Technology, University of Malta, Msida, Malta
Peter Xuereb
Department of Computer Science and Communication, Ostfold University, Halden, Norway
Sanjay Misra
Computer Science and Engineering, National Institute of Technology, Chhattisgarh, India
Deepak Singh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sisodia, D., Sisodia, D.S. (2023). Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising. In: Garg, L., et al. Information Systems and Management Science. ISMS 2021. Lecture Notes in Networks and Systems, vol 521. Springer, Cham. https://doi.org/10.1007/978-3-031-13150-9_34

Download citation

DOI: https://doi.org/10.1007/978-3-031-13150-9_34
Published: 29 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13149-3
Online ISBN: 978-3-031-13150-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics