On Ups and Downs in Analyzing Web Activity Data: Notes from a Project

Owsiński, Jan W.; Gajewski, Marek; Hryniewicz, Olgierd; Jastrzębska, Agnieszka; Kozakiewicz, Mariusz; Opara, Karol; Zadrożny, Sławomir; Zwierzchowski, Tomasz

doi:10.1007/978-981-19-8094-7_37

Jan W. Owsiński ORCID: orcid.org/0000-0002-2750-6584⁷,
Marek Gajewski⁷,
Olgierd Hryniewicz⁷,
Agnieszka Jastrzębska⁷,
Mariusz Kozakiewicz⁸,
Karol Opara⁷,
Sławomir Zadrożny⁷ &
…
Tomasz Zwierzchowski⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 333))

Included in the following conference series:

International Symposium on Intelligent Informatics

156 Accesses
1 Citations

Abstract

Analyzing data from the web is now one of the primary tasks, understood in a variety of manners and solved for a very wide variety of purposes. The talk describes the experience from a project, devoted to analyzing such data while drawing some more general conclusions. The project was aimed at distinguishing artificial ad-related traffic from the genuine one. The rationale is simple: The flow of money depends upon the number of clicks on/views of an ad. If so, fake clicking changes the market to the benefit of some, and to the loss of the other ones. The talk describes the problem and its conceptual framing, as well as a number of technical details, involving the issues and techniques of (1) variable analysis and choice; (2) clustering; (3) classification/classifiers; (4) potential hybrid techniques, along with citations of the most interesting results. These often imply definite general conclusions, some of them quite surprising.

The work reported was carried out within the project ABTShield, led by EDGE NPD Co. Ltd.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The best illustration is provided by the most recent sanctions against Russia in the context of her aggression against Ukraine: one of the key issues concerned the banking system and the possibility of performing transactions.
2.
The fact that the categorization is dichotomous or trichotomous does not mean that the problem is simple, see the task of identifying irony or sarcasm in the web-provided expressions.
3.
In this sequence of steps, we concentrate on the cognitive aspect, but, of course, the business side (costs and benefits expected) has to be normally accounted for on a par.
4.
Think of various functions, aggregates, statistical representations, etc., of the raw data.
5.
In the same step: is there any common sense (even if very rough) approach to the problem?
6.
We definitely believe that science should lead to truth, but it is most often simply, out of necessity, approximated.
7.
The web users are, as a rule, not aware that while they move to a given web page, supposed to provide the advertising content, their properties (as expressed through, in particular, the “cookies”) guide the flash auction, resulting in the advertising material they will actually see.
8.
We put apart the crawlers and bots with no “negative” objectives, gathering statistical data, etc.

References

M. Gajewski, O. Hryniewicz, A. Jastrzębska, K. Opara, J.W. Owsiński, S. Zadrożny, M. Kozakiewicz, T. Zwierzchowski: Explainable identification of bots from web activity logs, (2021) (submitted)
Google Scholar
M. Gajewski, O. Hryniewicz, A. Jastrzębska, M. Kozakiewicz, K. Opara, J.W. Owsiński, Sł. Zadrożny, T. Zwierzchowski: Assessing the Share of the Artificial Ad-Related Traffic: Some General Observations. Chapter 26 w: C. Ciurea et al. (Eds.) Education, Research and Business Technologies. Smart Innovation, Systems and Technologies 276. Springer Nature Singapore Pte Ltd., (2022)
Google Scholar
R. Mouawi, I.H. Elhajj, A Chehab, A Kayssi. Crowdsourcing for click fraud detection. EURASIP J. Inf. Secur, 11, (2019), https://doi.org/10.1186/s13635-019-0095-1
S. Khattak, N.R. Ramay, K.R. Khan, A.A. Syed, S.A. Khayam, A taxonomy of botnet behavior, detection, and defense. IEEE Commun. Surv. & Tutor. 16(2), 898–924 (2014)
Article Google Scholar
G.S.Thejas, S. Dheeshjith, S.S. Iyengar, N.R. Sunitha, P.A Badrinath, hybrid and effective learning approach for Click Fraud detection. Mach. Learn. Appl. 3, (2021), https://doi.org/10.1016/j.mlwa.2020.100016
I. Aberathne, C. Walgampaya Smart mobile bot detection through behavioral analysis, in Advances in Data and Information Sciences. Springer, (2018) pp. 241−252
Google Scholar
Y. Cai, G.O.M Yee, Y.X. Gu, C.-H. Lung Threats to online advertising and countermeasures: A technical survey. Digit. Threat.: Res. Pract, 1(2), (May 2020). https://doi.org/10.1145/3374136
M. Gagolewski, M. Bartoszuk, A. Cena, Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm. Inf. Sci. 363, 8–23 (2016)
Article Google Scholar
M. Ester, H.-P. Kriegel, J. Sander, X.-w. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise. In: E. Simoudis, J.-w. Han, U. M. Fayyad (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press, 226–231 (1996)
Google Scholar
R.F. Ling, On the theory and construction of k-clusters. Comput. J. 15(4), 326–332 (1972). https://doi.org/10.1093/comjnl/15.4.326
Article MathSciNet MATH Google Scholar
M.K. Pakhira A linear time-complexity k-means algorithm using cluster shifting, in 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, (2014), pp. 1047–1051, https://doi.org/10.1109/CICN.2014.220
M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques. J. Intell. Inf. Syst. 171(2–3), 107–145 (2001)
Article MATH Google Scholar
K. Kryszczuk, P. Hurley Estimation of the number of clusters using multiple clustering validity indices, in Multiple Classifier Systems. 2010. Lecture Notes in Computer Science. Springer: Cham. 5997: 114–123
Google Scholar
H.M. Sani, C. Lei, D. Neagu. Computational complexity analysis of decision tree algorithms. in M. Bramer, M Petridis. (eds.) Artificial Intelligence XXXV. SGAI 2018. Lecture Notes in Computer Science. Springer: Cham. 11311: 191–197
Google Scholar

Download references

Author information

Authors and Affiliations

Systems Research Institute, Polish Academy of Sciences, Newelska 6, 01447, Warsaw, Poland
Jan W. Owsiński, Marek Gajewski, Olgierd Hryniewicz, Agnieszka Jastrzębska, Karol Opara & Sławomir Zadrożny
EDGE NPD Co. Ltd, Warsaw, Poland
Mariusz Kozakiewicz & Tomasz Zwierzchowski

Authors

Jan W. Owsiński
View author publications
You can also search for this author in PubMed Google Scholar
Marek Gajewski
View author publications
You can also search for this author in PubMed Google Scholar
Olgierd Hryniewicz
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka Jastrzębska
View author publications
You can also search for this author in PubMed Google Scholar
Mariusz Kozakiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Karol Opara
View author publications
You can also search for this author in PubMed Google Scholar
Sławomir Zadrożny
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Zwierzchowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan W. Owsiński .

Editor information

Editors and Affiliations

School of Computer Science & Engineering (SoCSE), Innovation and Technology (KUDSIT), Kerala University of Digital Sciences, Trivandrum, Kerala, India
Sabu M. Thampi
Dept of Computer Science & Engineering, Indian Inst of Technology Kharagpur, Kharagpur, India
Jayanta Mukhopadhyay
PAN, Systems Research Institute, Warszawa, Poland
Marcin Paprzycki
Department of Computer Science and Information Engineering (CSIE), Providence University, Taichung, Taiwan
Kuan-Ching Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Owsiński, J.W. et al. (2023). On Ups and Downs in Analyzing Web Activity Data: Notes from a Project. In: Thampi, S.M., Mukhopadhyay, J., Paprzycki, M., Li, KC. (eds) International Symposium on Intelligent Informatics. ISI 2022. Smart Innovation, Systems and Technologies, vol 333. Springer, Singapore. https://doi.org/10.1007/978-981-19-8094-7_37

Download citation

DOI: https://doi.org/10.1007/978-981-19-8094-7_37
Published: 05 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8093-0
Online ISBN: 978-981-19-8094-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics