Mining and Classifying Images from an Advertisement Image Remover

O’Meara, Graeme

doi:10.1007/s40745-018-0164-1

Mining and Classifying Images from an Advertisement Image Remover

Published: 06 June 2018

Volume 6, pages 279–303, (2019)
Cite this article

Annals of Data Science Aims and scope Submit manuscript

Graeme O’Meara¹

184 Accesses
1 Citation
Explore all metrics

Abstract

AdEater is an early browsing assistant that automatically removes advertisement images from internet pages. It works by generating rules from training data and implementing these rules when browsing the internet. Advertisement images on web pages are replaced by transparent images that display on the image the word “ad”, and where images are misclassified, non-advertisement images on a webpage will also be replaced by transparent images displaying “ad”. This paper critically examines the dataset derived from a trial of AdEater and tries to build a robust image classifier. We apply data mining techniques to uncover associations between features of advertisements and non-advertisements and try to predict whether the images are advertisements or non-advertisements based on three classification methods. We achieve classification accuracy of 96.5%, using k-fold cross validation to train and test the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

http://uk.businessinsider.com/interview-with-the-inventor-of-the-ad-blocker-henrik-aasted-srensen-2015-7?r=US&IR=T.
Fiol-Roig et al. [2].
See Electronic Supplementary Material. Full code available upon request.
Agrawal et al. [8].
As indicated by the value 0. We converted these to 1 and converted the present (indicated by 1) values to 0 to look at rules between absent image features.
It was decided not to report lists of rules of more than 20 for reasons of brevity. These rules can be made available upon request.
\( \frac{{Lift \left( {A \to B} \right) - L}}{U - L} \).
Method for pruning sourced here: http://www.rdatamining.com/examples/association-rules.
Witten and Frank [10].
https://datascienceplus.com/k-means-clustering-in-r/.
We applied the “binary” method in computing the distance in R.
Note that the data excludes the classifier variable and relates only to the 1554 image features.
Objective function: \( \mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{K} z_{ij} d\left( {x_{i} ,\mu_{j} } \right) \).
See Rand [11]. The Rand Index is between 0 and 1:
\( \frac{{\left( {\begin{array}{*{20}c} n \\ 2 \\ \end{array} } \right) + 2\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n_{ij} } \\ 2 \\ \end{array} } \right) - \left[ {\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \left( {\begin{array}{*{20}c} {n._{j} } \\ 2 \\ \end{array} } \right) + \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n._{i} } \\ 2 \\ \end{array} } \right)} \right]}} {\left({{\begin{array}{*{20}c}n \\ 2 \\ \end{array} }}\right) }\).
Hubert and Arabie [12] developed an adjusted Rand Index.
Other methods including bagging and random forest were implemented, however, due to run times, these could not make the final report.
We selected 10 for computing/run time reasons. This means that each fold has about 328 observations.

References

Kushmerick N (1999) Learning to remove internet advertisements. In: Agents’99, proceedings of the third annual conference on autonomous agents, pp 175–181
Fiol-Roig G, Miró-Julià M, Herraiz E (2011) Data mining techniques for web page classification. In: Highlights in practical applications of agents and multiagent systems, pp 61–68. https://link.springer.com/chapter/10.1007/978-3-642-19917-2_8
Iyengar V, Apte C, Zhang T (2000) Active learning using adaptive resampling. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, 2000, pp 91–98. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.5070&rep=rep1&type=pdf
Alvarez S, Kawato T, Ruiz C (2003) Mining over loosely couple data sources using neural experts. Computer Science Department, Boston College, Boston. https://pdfs.semanticscholar.org/bf09/0b728a3798fe4f95cc009590674fa555c316.pdf
Cohen S, Ruppin E, Dror G (2005) Feature selection based on the Shapley value. In: Proceedings of the 19th international joint conference on artificial intelligence, pp 665–670
Li Z, Wang Y, Bing Y (2005) Advertisement imagine detection. Manuscript http://www.cas.mcmaster.ca/~wangy22/public/FinalReport.pdf
Almonte I, Anden R, Schwarzbek S (2012) Evaluating machine learning classification methods for internet advertising data. Manuscript http://www.irvinalmonte.com/wp-content/uploads/2016/10/IrvinAlmonte_MachineLearning_CollaborativeSample_R.pdf
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data—SIGMOD'93. p 207. https://doi.org/10.1145/170035.170072
McNicholas PD, Murphy TB, O’Regan M (2008) Standardising the life of an association rule. Comput Stat Data Anal 52(10):4712–4721
Article Google Scholar
Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Academic Press, New York
Google Scholar
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Article Google Scholar
Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218
Article Google Scholar

Download references

Author information

Authors and Affiliations

University College Dublin, Belfield, Dublin 4, Ireland
Graeme O’Meara

Authors

Graeme O’Meara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Graeme O’Meara.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (CSV 34 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

O’Meara, G. Mining and Classifying Images from an Advertisement Image Remover. Ann. Data. Sci. 6, 279–303 (2019). https://doi.org/10.1007/s40745-018-0164-1

Download citation

Received: 13 July 2017
Revised: 22 March 2018
Accepted: 30 April 2018
Published: 06 June 2018
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s40745-018-0164-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining and Classifying Images from an Advertisement Image Remover

Abstract

Access this article

Similar content being viewed by others

The Recapitulate Analysis of Image Mining Techniques Applications and Challenges Associated

Automatic Attribute Discovery with Neural Activations

Webly Supervised Image Classification with Self-contained Confidence

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (CSV 34 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining and Classifying Images from an Advertisement Image Remover

Abstract

Access this article

Similar content being viewed by others

The Recapitulate Analysis of Image Mining Techniques Applications and Challenges Associated

Automatic Attribute Discovery with Neural Activations

Webly Supervised Image Classification with Self-contained Confidence

Notes

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (CSV 34 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation