SimiLay: A Developing Web Page Layout Based Visual Similarity Search Engine

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8556)


Web page visual similarity has been a trend topic in last decade. Furthermore, effective methods and approaches are crucial for phishing detection and related issues. In this study, we aim to develop a search engine for web page visual similarity and propose a novel method for capturing and calculating layout similarity of web pages. To achieve this, web page elements are classified and mapped with a novel technique. Furthermore, an extension of well known bag of features approach named spatial pyramid match has been employed via histogram intersection schema for capturing and measuring the partial and whole page layout similarity. Promising results demonstrate that spatial pyramid matching kernel can be used for this field.


Web page visual similarity spatial pyramid match kernel bag of words 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Yang, Y., Zhang, H.J.: HTML Page Analysis Based on Visual Cues. In: Proceedings of Sixth International Conference on Document Analysis and Recognition (2001)Google Scholar
  2. 2.
    Alpuante, M., Romero, D.: A Visual Technique for Web Pages Comparison. Electronic Notes in Theoretical Computer Science 235, 3–18 (2009)CrossRefGoogle Scholar
  3. 3.
    Eglin, V., Bres, S.: Document Page Similarity based on Layout visual saliency: Application to query by example and document classification. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition (2003)Google Scholar
  4. 4.
    Wan, X.: A Novel Documents Similarity Measure based on Earth Mover’s Distance. Information Sciences 177, 3718–3730 (2007)CrossRefGoogle Scholar
  5. 5.
    Kang, J., Choi, J.: Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction. Journal of Universal Computer Science 14(11), 1893–1910 (2008)Google Scholar
  6. 6.
    Hara, M., Yamada, A., Miyake, Y.: Visual Similarity-based Phishing Detection without Victim Site Information. In: Proceedings of Computational Intelligence in Cyber Security 2009, pp. 30–36 (2009)Google Scholar
  7. 7.
    Law, M.T., Gutierrez, C.S., Thome, N., Gançarski, S., Cord, M.: Structural and Visual Similarity Learning for Web Page Archiving. In: Proceeding of CBMI, pp. 1–6 (2012)Google Scholar
  8. 8.
    Bohunsky, P., Gatterbauer, W.: Visual Structure-based Web Page Clustering and Retrieval. In: Proceddings of the 19th International Conference on World Wide Web, pp. 1067–1068 (2010)Google Scholar
  9. 9.
    Medvet, E., Kirda, E., Kruegel, C.: Visual-Similarity-Based Phishing Detection. In: Proceedings of SecureComm 2008 (2008)Google Scholar
  10. 10.
    Alpuente, M., Romero, D.: A Tool for Computing the Visual Similarity of Web Pages. In: Proceedings of Applications and the Internet, SAINT (2010)Google Scholar
  11. 11.
    Rosiello, A.P., Kirda, E., Kruegel, C., Ferrandi, F.: A Layout-Similarity-Based Approach for Detecting Phishing Pages. In: Proceeding of Security and Privacy in Communications Networks and the Workshops, pages, pp. 457–463 (2007)Google Scholar
  12. 12.
    Gartner Press Release. Gartner Says Number of Phishing E-mails Sent to U.S. Adults early Doubles in Just Two Years (2006),
  13. 13.
    Gehrke, D., Turban, E.: Determinant of successful website design: Relative importance and recommendations for effectiveness. In: Proceedings of the 32th Hawaii International Conference on System Sciences (1999)Google Scholar
  14. 14.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond Bags of Features: Spatial Pyramid Matching Recognizing Natural Scene Categories. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)Google Scholar
  15. 15.
    O’Hara, S., Draper, B.A.: Introduction to The Bag of Features Paradigm for Image Classification and Retrieval, CoRR abs/1101.3354 (2011)Google Scholar
  16. 16.
    Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: VIPS: a Vision-based Page Segmentation Algorithm, Technical Report MSR-TR-2003-79, Microsoft Research (2003)Google Scholar
  17. 17.
    Guo, H., Mahmud, J., Borodin, Y., Stent, A., Ramakrishnan, I.V.: A General Approach for Partioning Web Page Content Based on Geometric and Style Information. In: Proceedings of Document Analysis and Recognition – ICDAR 2007 (2007)Google Scholar
  18. 18.
    Tombros, A., Ali, Z.: Factors Affecting Web Page Similarity. In: Proceedings of ICIR 2005, pages, pp. 487–501 (2005)Google Scholar
  19. 19.
    ImgSeek (January 28, 2014),
  20. 20.
    Pnueli, A., Bergman, R., Schein, S., Barkol, O.: Web Page Layout Via Visual Segmentation, Technical Report HPL-2009-160 (2009)Google Scholar
  21. 21.
    Kudelka, M., Takama, T., Snasel, V., Klos, K.: Visual Similarity of Web Pages, Advance. Intelligent and Soft Computing 67, 135-146 (2010)Google Scholar
  22. 22.
    Chen, T.C., Dick, S., Miller, J.: Detecting Visually Similar Web Pages: Application to Phishing Detection. ACM Transactions on Internet Technology 10(2) (2010)Google Scholar
  23. 23.
    Koenderink, J., Doorn, A.V.: The structure of locally orderless images. IJVC 31(2/3), 159–168 (1999)Google Scholar
  24. 24.
    Lazebnik, S., Schmid, C., Ponce, J.: Spatial Pyramid Matching,
  25. 25.
    Grauman, K., Darrell, T.: Pyramid match kernels: Discriminative classification with sets of image features. In: Proceedings of ICCV (2005)Google Scholar
  26. 26.
    Mozilla GeckoFx.Net (January 28, 2014),
  27. 27.
    ASP.NET (January 30, 2014),
  28. 28.
    What is CSS? (February 1, 2014),

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Computer Science and Engineering DepartmentHacettepe UniversityAnkaraTurkey

Personalised recommendations