Skip to main content

On Ups and Downs in Analyzing Web Activity Data: Notes from a Project

  • Conference paper
  • First Online:
International Symposium on Intelligent Informatics (ISI 2022)

Abstract

Analyzing data from the web is now one of the primary tasks, understood in a variety of manners and solved for a very wide variety of purposes. The talk describes the experience from a project, devoted to analyzing such data while drawing some more general conclusions. The project was aimed at distinguishing artificial ad-related traffic from the genuine one. The rationale is simple: The flow of money depends upon the number of clicks on/views of an ad. If so, fake clicking changes the market to the benefit of some, and to the loss of the other ones. The talk describes the problem and its conceptual framing, as well as a number of technical details, involving the issues and techniques of (1) variable analysis and choice; (2) clustering; (3) classification/classifiers; (4) potential hybrid techniques, along with citations of the most interesting results. These often imply definite general conclusions, some of them quite surprising.

The work reported was carried out within the project ABTShield, led by EDGE NPD Co. Ltd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The best illustration is provided by the most recent sanctions against Russia in the context of her aggression against Ukraine: one of the key issues concerned the banking system and the possibility of performing transactions.

  2. 2.

    The fact that the categorization is dichotomous or trichotomous does not mean that the problem is simple, see the task of identifying irony or sarcasm in the web-provided expressions.

  3. 3.

    In this sequence of steps, we concentrate on the cognitive aspect, but, of course, the business side (costs and benefits expected) has to be normally accounted for on a par.

  4. 4.

    Think of various functions, aggregates, statistical representations, etc., of the raw data.

  5. 5.

    In the same step: is there any common sense (even if very rough) approach to the problem?

  6. 6.

    We definitely believe that science should lead to truth, but it is most often simply, out of necessity, approximated.

  7. 7.

    The web users are, as a rule, not aware that while they move to a given web page, supposed to provide the advertising content, their properties (as expressed through, in particular, the “cookies”) guide the flash auction, resulting in the advertising material they will actually see.

  8. 8.

    We put apart the crawlers and bots with no “negative” objectives, gathering statistical data, etc.

References

  1. M. Gajewski, O. Hryniewicz, A. Jastrzębska, K. Opara, J.W. Owsiński, S. Zadrożny, M. Kozakiewicz, T. Zwierzchowski: Explainable identification of bots from web activity logs, (2021) (submitted)

    Google Scholar 

  2. M. Gajewski, O. Hryniewicz, A. Jastrzębska, M. Kozakiewicz, K. Opara, J.W. Owsiński, Sł. Zadrożny, T. Zwierzchowski: Assessing the Share of the Artificial Ad-Related Traffic: Some General Observations. Chapter 26 w: C. Ciurea et al. (Eds.) Education, Research and Business Technologies. Smart Innovation, Systems and Technologies 276. Springer Nature Singapore Pte Ltd., (2022)

    Google Scholar 

  3. R. Mouawi, I.H. Elhajj, A Chehab, A Kayssi. Crowdsourcing for click fraud detection. EURASIP J. Inf. Secur, 11, (2019), https://doi.org/10.1186/s13635-019-0095-1

  4. S. Khattak, N.R. Ramay, K.R. Khan, A.A. Syed, S.A. Khayam, A taxonomy of botnet behavior, detection, and defense. IEEE Commun. Surv. & Tutor. 16(2), 898–924 (2014)

    Article  Google Scholar 

  5. G.S.Thejas, S. Dheeshjith, S.S. Iyengar, N.R. Sunitha, P.A Badrinath, hybrid and effective learning approach for Click Fraud detection. Mach. Learn. Appl. 3, (2021), https://doi.org/10.1016/j.mlwa.2020.100016

  6. I. Aberathne, C. Walgampaya Smart mobile bot detection through behavioral analysis, in Advances in Data and Information Sciences. Springer, (2018) pp. 241−252

    Google Scholar 

  7. Y. Cai, G.O.M Yee, Y.X. Gu, C.-H. Lung Threats to online advertising and countermeasures: A technical survey. Digit. Threat.: Res. Pract, 1(2), (May 2020). https://doi.org/10.1145/3374136

  8. M. Gagolewski, M. Bartoszuk, A. Cena, Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)

    Article  Google Scholar 

  9. M. Ester, H.-P. Kriegel, J. Sander, X.-w. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise. In: E. Simoudis, J.-w. Han, U. M. Fayyad (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press, 226–231 (1996)

    Google Scholar 

  10. R.F. Ling, On the theory and construction of k-clusters. Comput. J. 15(4), 326–332 (1972). https://doi.org/10.1093/comjnl/15.4.326

    Article  MathSciNet  MATH  Google Scholar 

  11. M.K. Pakhira A linear time-complexity k-means algorithm using cluster shifting, in 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, (2014), pp. 1047–1051, https://doi.org/10.1109/CICN.2014.220

  12. M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques. J. Intell. Inf. Syst. 171(2–3), 107–145 (2001)

    Article  MATH  Google Scholar 

  13. K. Kryszczuk, P. Hurley Estimation of the number of clusters using multiple clustering validity indices, in Multiple Classifier Systems. 2010. Lecture Notes in Computer Science. Springer: Cham. 5997: 114–123

    Google Scholar 

  14. H.M. Sani, C. Lei, D. Neagu. Computational complexity analysis of decision tree algorithms. in M. Bramer, M Petridis. (eds.) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science. Springer: Cham. 11311: 191–197

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jan W. Owsiński .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Owsiński, J.W. et al. (2023). On Ups and Downs in Analyzing Web Activity Data: Notes from a Project. In: Thampi, S.M., Mukhopadhyay, J., Paprzycki, M., Li, KC. (eds) International Symposium on Intelligent Informatics. ISI 2022. Smart Innovation, Systems and Technologies, vol 333. Springer, Singapore. https://doi.org/10.1007/978-981-19-8094-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8094-7_37

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8093-0

  • Online ISBN: 978-981-19-8094-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics