Skip to main content
Log in

Mining and Classifying Images from an Advertisement Image Remover

  • Published:
Annals of Data Science Aims and scope Submit manuscript

Abstract

AdEater is an early browsing assistant that automatically removes advertisement images from internet pages. It works by generating rules from training data and implementing these rules when browsing the internet. Advertisement images on web pages are replaced by transparent images that display on the image the word “ad”, and where images are misclassified, non-advertisement images on a webpage will also be replaced by transparent images displaying “ad”. This paper critically examines the dataset derived from a trial of AdEater and tries to build a robust image classifier. We apply data mining techniques to uncover associations between features of advertisements and non-advertisements and try to predict whether the images are advertisements or non-advertisements based on three classification methods. We achieve classification accuracy of 96.5%, using k-fold cross validation to train and test the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

(reproduced with permission from Fiol-Roig et al. [2])

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. http://uk.businessinsider.com/interview-with-the-inventor-of-the-ad-blocker-henrik-aasted-srensen-2015-7?r=US&IR=T.

  2. Fiol-Roig et al. [2].

  3. See Electronic Supplementary Material. Full code available upon request.

  4. Agrawal et al. [8].

  5. As indicated by the value 0. We converted these to 1 and converted the present (indicated by 1) values to 0 to look at rules between absent image features.

  6. It was decided not to report lists of rules of more than 20 for reasons of brevity. These rules can be made available upon request.

  7. \( \frac{{Lift \left( {A \to B} \right) - L}}{U - L} \).

  8. Method for pruning sourced here: http://www.rdatamining.com/examples/association-rules.

  9. Witten and Frank [10].

  10. https://datascienceplus.com/k-means-clustering-in-r/.

  11. We applied the “binary” method in computing the distance in R.

  12. Note that the data excludes the classifier variable and relates only to the 1554 image features.

  13. Objective function: \( \mathop \sum \nolimits_{i = 1}^{n} \mathop \sum \nolimits_{j = 1}^{K} z_{ij} d\left( {x_{i} ,\mu_{j} } \right) \).

  14. See Rand [11]. The Rand Index is between 0 and 1:

    \( \frac{{\left( {\begin{array}{*{20}c} n \\ 2 \\ \end{array} } \right) + 2\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n_{ij} } \\ 2 \\ \end{array} } \right) - \left[ {\mathop \sum \nolimits_{i = 1}^{{c_{1} }} \left( {\begin{array}{*{20}c} {n._{j} } \\ 2 \\ \end{array} } \right) + \mathop \sum \nolimits_{j = 1}^{{c_{2} }} \left( {\begin{array}{*{20}c} {n._{i} } \\ 2 \\ \end{array} } \right)} \right]}} {\left({{\begin{array}{*{20}c}n \\ 2 \\ \end{array} }}\right) }\).

  15. Hubert and Arabie [12] developed an adjusted Rand Index.

  16. Other methods including bagging and random forest were implemented, however, due to run times, these could not make the final report.

  17. We selected 10 for computing/run time reasons. This means that each fold has about 328 observations.

References

  1. Kushmerick N (1999) Learning to remove internet advertisements. In: Agents’99, proceedings of the third annual conference on autonomous agents, pp 175–181

  2. Fiol-Roig G, Miró-Julià M, Herraiz E (2011) Data mining techniques for web page classification. In: Highlights in practical applications of agents and multiagent systems, pp 61–68. https://link.springer.com/chapter/10.1007/978-3-642-19917-2_8

  3. Iyengar V, Apte C, Zhang T (2000) Active learning using adaptive resampling. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, 2000, pp 91–98. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.34.5070&rep=rep1&type=pdf

  4. Alvarez S, Kawato T, Ruiz C (2003) Mining over loosely couple data sources using neural experts. Computer Science Department, Boston College, Boston. https://pdfs.semanticscholar.org/bf09/0b728a3798fe4f95cc009590674fa555c316.pdf

  5. Cohen S, Ruppin E, Dror G (2005) Feature selection based on the Shapley value. In: Proceedings of the 19th international joint conference on artificial intelligence, pp 665–670

  6. Li Z, Wang Y, Bing Y (2005) Advertisement imagine detection. Manuscript http://www.cas.mcmaster.ca/~wangy22/public/FinalReport.pdf

  7. Almonte I, Anden R, Schwarzbek S (2012) Evaluating machine learning classification methods for internet advertising data. Manuscript http://www.irvinalmonte.com/wp-content/uploads/2016/10/IrvinAlmonte_MachineLearning_CollaborativeSample_R.pdf

  8. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data—SIGMOD'93. p 207. https://doi.org/10.1145/170035.170072

  9. McNicholas PD, Murphy TB, O’Regan M (2008) Standardising the life of an association rule. Comput Stat Data Anal 52(10):4712–4721

    Article  Google Scholar 

  10. Witten IH, Frank E (2000) Data mining: practical machine learning tools and techniques with java implementations. Academic Press, New York

    Google Scholar 

  11. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  12. Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Graeme O’Meara.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (CSV 34 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

O’Meara, G. Mining and Classifying Images from an Advertisement Image Remover. Ann. Data. Sci. 6, 279–303 (2019). https://doi.org/10.1007/s40745-018-0164-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40745-018-0164-1

Keywords

Navigation