Skip to main content

Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising

  • Conference paper
  • First Online:
Information Systems and Management Science (ISMS 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 521))

Included in the following conference series:

Abstract

Due to the involvement of a huge amount of expenditure, the Pay-per-Click digital advertising model is affected by click fraud. The rare presence of illegitimate publishers than the legitimates makes the class distribution skewed which in turn biases the learning model towards the majority class. Moreover, the dynamic disguise activities of publishers make the task harder to investigate the actual status labels of the publishers, resulting in less accurate predictions. However, features reported in the literature detect click fraud based on analyzing the publishers’ conduct but are unable to cope with the changing behavior of publishers. Therefore, we proposed a framework addressing two challenging issues: a) eight new conditional features are proposed to better capture the dynamics of click fraud behavior by merging two or more attributes over publishers on finer-grained time intervals. b) Examined the impact of data sampling methods on learners’ performance for the effective identification of publisher’s activity. The performance of the classification algorithms is assessed using average precision, recall, and f-measure and validated using 10-fold cross-validation. Experimental results illustrate the potential of the proposed features over a balanced dataset in improving the average precision of the learners in discriminating the fraudsters among genuine publishers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sisodia, D., Sisodia, D.S.: Gradient boosting learning for fraudulent publisher detection in online advertising. Data Technol. Appl. 55, 216–232 (2020). https://doi.org/10.1108/DTA-04-2020-0093

    Article  Google Scholar 

  2. Oentaryo, R., et al.: Detecting click fraud in online advertising: a data mining approach. J. Mach. Learn. Res. 15, 99–140 (2014). https://doi.org/10.1145/2623330.2623718

    Article  MathSciNet  Google Scholar 

  3. Haider, C.M.R., Iqbal, A., Rahman, A.H., Rahman, M.S.: An ensemble learning based approach for impression fraud detection in mobile advertising. J. Netw. Comput. Appl. 112, 126–141 (2018). https://doi.org/10.1016/j.jnca.2018.02.021

    Article  Google Scholar 

  4. Nagaraja, S., Shah, R.: Clicktok: click fraud detection using traffic analysis. In: Proceedings of the 12th Conference on Security and Privacy in Wireless and Mobile Networks, Miami Florida, pp. 105–116 (2019)

    Google Scholar 

  5. Sisodia, D., Sisodia, D.S.: Data sampling strategies for click fraud detection using imbalanced user click data of online advertising : an empirical review. IETE Tech. Rev. (2021). https://doi.org/10.1080/02564602.2021.1915892

  6. Almeida, P.S., Gondim, J.J.C.: Click fraud detection and prevention system for ad networks. J. Inf. Secur. Crytography 5, 27–40 (2018)

    Article  Google Scholar 

  7. Tripathi, D., Nigam, B., Edla, D.R.: A novel web fraud detection technique using association rule mining. Procedia Comput. Sci. 115, 274–281 (2017). https://doi.org/10.1016/j.procs.2017.09.135

    Article  Google Scholar 

  8. Skersys, T., Butleris, R., Butkiene, R. (eds.): ICIST 2012. CCIS, vol. 319. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33308-8

    Book  Google Scholar 

  9. Mouawi, R., Awad, M., Chehab, A., El Hajj, I.H., Kayssi, A.: Towards a machine learning approach for detecting click fraud in mobile advertizing. In: Proceedings of the 13th International Conference on Innovations in Information Technology, pp. 88–92. IIT, Al Ain, UAE. IEEE (2019). https://doi.org/10.1109/INNOVATIONS.2018.8605973

  10. Haddadi, H.: Fighting online click-fraud using bluff ads. ACM SIGCOMM Comput. Commun. Rev. 40, 21–25 (2010)

    Article  Google Scholar 

  11. Walgampaya, C., Kantardzic, M., Yampolskiy, R.: Evidence fusion for real time click fraud detection and prevention. In: Ao, S.-I., Amouzegar, M., Rieger, B.B. (eds.) Intelligent Automation and Systems Engineering, pp. 1–14. Springer New York, New York, NY (2011). https://doi.org/10.1007/978-1-4614-0373-9_1

    Chapter  Google Scholar 

  12. Antoniou, D., et al.: Exposing click-fraud using a burst detection algorithm. In: Proceedings - IEEE Symposium on Computers and Communications. pp. 1111–1116. IEEE (2011). https://doi.org/10.1109/ISCC.2011.5983854

  13. Perera, K.S., Neupane, B., Faisal, M.A., Aung, Z., Woon, W.L.: A novel ensemble learning-based approach for click fraud detection in mobile advertising. In: Prasath, R., Kathirvalavakumar, T. (eds.) MIKE 2013. LNCS (LNAI), vol. 8284, pp. 370–382. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-03844-5_38

    Chapter  Google Scholar 

  14. Miralles-Pechuán, L., Ponce, H., Martínez-Villaseñor, L.: A novel methodology for optimizing display advertising campaigns using genetic algorithms. Electron. Commer. Res. Appl. 27, 39–51 (2018). https://doi.org/10.1016/j.elerap.2017.11.004

    Article  Google Scholar 

  15. Berrar, D.: Learning from automatically labeled data: case study on click fraud prediction. Knowl. Inf. Syst. 46(2), 477–490 (2015). https://doi.org/10.1007/s10115-015-0827-6

    Article  Google Scholar 

  16. Last, F., Douzas, G., Bacao, F.: Oversampling for Imbalanced Learning Based on K-Means and SMOTE. arXiv Prepr. arXiv1711.00837 (2017). https://doi.org/10.1533/9780857096166

  17. Nguyen, H.M., Cooper, E.W., Kamei, K.: Borderline over-sampling for imbalanced data classification. Int. J. Knowl. Eng. Soft Data Paradigms 3(1), 4 (2011). https://doi.org/10.1504/IJKESDP.2011.039875

    Article  Google Scholar 

  18. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  19. Sisodia, D., Sisodia, D.S.: Quad division prototype selection-based k-nearest neighbor classifier for click fraud detection from highly skewed user click dataset. Eng. Sci. Technol. Int. J. 28, 101011 (2022). https://doi.org/10.1016/j.jestch.2021.05.015

    Article  Google Scholar 

  20. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Procedia Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122

    Article  Google Scholar 

  21. Sutton, C.D.: Classification and Regression Trees, Bagging, and Boosting. Elsevier (2004). https://doi.org/10.1016/S0169-7161(04)24011-1

    Article  Google Scholar 

  22. Sisodia, D., Shrivastava, S.K., Jain, R.C.: ISVM for face recognition. In: International Conference on Computational Intelligence and Communication Networks, (CICN), pp. 554–559 (2010). https://doi.org/10.1109/CICN.2010.109

  23. Sisodia, D., Singh, L., Sisodia, S.: Fast and accurate face recognition using SVM and DCT. In: Babu, B.V., et al. (eds.) Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. AISC, vol. 236, pp. 1027–1038. Springer, New Delhi (2014). https://doi.org/10.1007/978-81-322-1602-5_108

    Chapter  Google Scholar 

  24. Tharwat, A.: Classification assessment methods. Appl. Comput. Inform. 17(1), 168–192 (2020). https://doi.org/10.1016/j.aci.2018.08.003

    Article  Google Scholar 

  25. Berrar, D.: Random forests for the detection of click fraud in online mobile advertising. In: Proceedings of 2012 International Workshop on Fraud Detection in Mobile Advertising (FDMA), Singapore, pp. 1–10 (2012)

    Google Scholar 

  26. Vasumati, D., Vani, M.S., Bhramaramba, R., Babu, O.Y.: Data mining approach to filter click-spam in mobile ad networks. In: Int’l Conference on Computer Science. Data Mining & Mechanical Engg, pp. 90–94. ICCDMME Bangkok, Thailand (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepti Sisodia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sisodia, D., Sisodia, D.S. (2023). Data Sampling Methods for Analyzing Publishers Conduct from Highly Imbalanced Dataset in Web Advertising. In: Garg, L., et al. Information Systems and Management Science. ISMS 2021. Lecture Notes in Networks and Systems, vol 521. Springer, Cham. https://doi.org/10.1007/978-3-031-13150-9_34

Download citation

Publish with us

Policies and ethics