Abstract
Web page visual similarity has been a trend topic in last decade. Furthermore, effective methods and approaches are crucial for phishing detection and related issues. In this study, we aim to develop a search engine for web page visual similarity and propose a novel method for capturing and calculating layout similarity of web pages. To achieve this, web page elements are classified and mapped with a novel technique. Furthermore, an extension of well known bag of features approach named spatial pyramid match has been employed via histogram intersection schema for capturing and measuring the partial and whole page layout similarity. Promising results demonstrate that spatial pyramid matching kernel can be used for this field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Y., Zhang, H.J.: HTML Page Analysis Based on Visual Cues. In: Proceedings of Sixth International Conference on Document Analysis and Recognition (2001)
Alpuante, M., Romero, D.: A Visual Technique for Web Pages Comparison. Electronic Notes in Theoretical Computer Science 235, 3–18 (2009)
Eglin, V., Bres, S.: Document Page Similarity based on Layout visual saliency: Application to query by example and document classification. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003)
Wan, X.: A Novel Documents Similarity Measure based on Earth Mover’s Distance. Information Sciences 177, 3718–3730 (2007)
Kang, J., Choi, J.: Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction. Journal of Universal Computer Science 14(11), 1893–1910 (2008)
Hara, M., Yamada, A., Miyake, Y.: Visual Similarity-based Phishing Detection without Victim Site Information. In: Proceedings of Computational Intelligence in Cyber Security 2009, pp. 30–36 (2009)
Law, M.T., Gutierrez, C.S., Thome, N., Gançarski, S., Cord, M.: Structural and Visual Similarity Learning for Web Page Archiving. In: Proceeding of CBMI, pp. 1–6 (2012)
Bohunsky, P., Gatterbauer, W.: Visual Structure-based Web Page Clustering and Retrieval. In: Proceddings of the 19th International Conference on World Wide Web, pp. 1067–1068 (2010)
Medvet, E., Kirda, E., Kruegel, C.: Visual-Similarity-Based Phishing Detection. In: Proceedings of SecureComm 2008 (2008)
Alpuente, M., Romero, D.: A Tool for Computing the Visual Similarity of Web Pages. In: Proceedings of Applications and the Internet, SAINT (2010)
Rosiello, A.P., Kirda, E., Kruegel, C., Ferrandi, F.: A Layout-Similarity-Based Approach for Detecting Phishing Pages. In: Proceeding of Security and Privacy in Communications Networks and the Workshops, pages, pp. 457–463 (2007)
Gartner Press Release. Gartner Says Number of Phishing E-mails Sent to U.S. Adults early Doubles in Just Two Years (2006), http://www.gartner.com/it/page.jsp?id=498245
Gehrke, D., Turban, E.: Determinant of successful website design: Relative importance and recommendations for effectiveness. In: Proceedings of the 32th Hawaii International Conference on System Sciences (1999)
Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching Recognizing Natural Scene Categories. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
O’Hara, S., Draper, B.A.: Introduction to The Bag of Features Paradigm for Image Classification and Retrieval, CoRR abs/1101.3354 (2011)
Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: VIPS: a Vision-based Page Segmentation Algorithm, Technical Report MSR-TR-2003-79, Microsoft Research (2003)
Guo, H., Mahmud, J., Borodin, Y., Stent, A., Ramakrishnan, I.V.: A General Approach for Partioning Web Page Content Based on Geometric and Style Information. In: Proceedings of Document Analysis and Recognition – ICDAR 2007 (2007)
Tombros, A., Ali, Z.: Factors Affecting Web Page Similarity. In: Proceedings of ICIR 2005, pages, pp. 487–501 (2005)
ImgSeek (January 28, 2014), http://www.imageseek.net
Pnueli, A., Bergman, R., Schein, S., Barkol, O.: Web Page Layout Via Visual Segmentation, Technical Report HPL-2009-160 (2009)
Kudelka, M., Takama, T., Snasel, V., Klos, K.: Visual Similarity of Web Pages, Advance. Intelligent and Soft Computing 67, 135-146 (2010)
Chen, T.C., Dick, S., Miller, J.: Detecting Visually Similar Web Pages: Application to Phishing Detection. ACM Transactions on Internet Technology 10(2) (2010)
Koenderink, J., Doorn, A.V.: The structure of locally orderless images. IJVC 31(2/3), 159–168 (1999)
Lazebnik, S., Schmid, C., Ponce, J.: Spatial Pyramid Matching, http://www.cs.unc.edu/~lazebnik/publications/pyramid_chapter.pdf
Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: Proceedings of ICCV (2005)
Mozilla GeckoFx.Net (January 28, 2014), https://code.google.com/p/geckofx/
ASP.NET (January 30, 2014), http://www.asp.net/
What is CSS? (February 1, 2014), http://www.w3.org/Style/CSS/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bozkır, A.S., Sezer, E.A. (2014). SimiLay: A Developing Web Page Layout Based Visual Similarity Search Engine. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2014. Lecture Notes in Computer Science(), vol 8556. Springer, Cham. https://doi.org/10.1007/978-3-319-08979-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-08979-9_35
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08978-2
Online ISBN: 978-3-319-08979-9
eBook Packages: Computer ScienceComputer Science (R0)