Abstract
Web pages consist of different visual segments, serving different purposes. Typical structural segments are header, right or left columns and main content. Segments can also have nested structure which means some segments may include other segments. Understanding these segments is important in properly displaying web pages for small screen devices and in alternative forms such as audio for screen reader users. There exist different techniques in identifying visual segments in a web page. One successful approach is Vision Based Segmentation Algorithm (VIPS Algorithm) which uses both the underlying source code and also the visual rendering of a web page. However, there are some limitations of this approach and this paper explains how we have extended and improved VIPS and built it in Java. We have also conducted some online user evaluations to investigate how people perceive the success of the segmentation approach and in which granularity they prefer to see a web page segmented. This paper presents the preliminary results which show that, people perceive segmentation with higher granularity as better segmentation regardless of the web page complexity.
Chapter PDF
Similar content being viewed by others
References
Ahmadi, H., Kong, J.: Efficient web browsing on small screens. In: Proceedings of the Working Conference on Advanced Visual Interfaces, AVI 2008, pp. 23–30. ACM, New York (2008)
Asakawa, C., Takagi, H.: Annotation-based transcoding for nonvisual web access. In: ASSETS 2000, pp. 172–179. ACM Press (2000)
Baluja, S.: Browsing on small screens: recasting web-page segmentation into an efficient machine learning framework. In: WWW 2006: Proceedings of the 15th International Conference on World Wide Web, pp. 33–42. ACM, New York (2006)
Borodin, Y., Mahmud, J., Ramakrishnan, I.V., Stent, A.: The hearsay non-visual web browser. In: Proceedings of the 2007 International Cross-disciplinary Conference on Web Accessibility (W4A 2007), pp. 128–129. ACM, New York (2007)
Cai, D., He, X., Li, Z., Ma, W.Y., Wen, J.R.: Hierarchical clustering of www image search results using visual, textual and link information. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, MULTIMEDIA 2004, pp. 952–959. ACM, New York (2004)
Cai, D., He, X., Wen, J.R., Ma, W.Y.: Block-level link analysis. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2004, pp. 440–447. ACM, New York (2004), http://doi.acm.org/10.1145/1008992.1009068
Cai, D., Yu, S., Wen, J.R., Ma, W.Y.: Vips: a vision based page segmentation algorithm. Tech. Rep. MSR-TR-2003-79, Microsoft Research (2003)
Chen, J., Zhou, B., Shi, J., Zhang, H., Wu, Q.: Function-based object towards website adaptation. In: Proceedings of the Tenth International World Wide Web Conference. ACM, Hong Kong (2001)
Chen, Y., Ma, W., Zhang, H.: Detecting web page structure for adaptive viewing on small form factor devices. In: Proceedings of the Twelfth International World Wide Web Conference (2003)
Chen, Y., Xie, X., Ma, W.Y., Zhang, H.J.: Adapting web pages for small-screen devices. IEEE Internet Computing 9, 50–56 (2005), http://portal.acm.org/citation.cfm?id=1053547.1053593
Hattori, G., Hoashi, K., Matsumoto, K., Sugaya, F.: Robust web page segmentation for mobile terminal using content-distances and page layout information. In: WWW 2007: Proceedings of the 16th International Conference on World Wide Web, pp. 361–370. ACM Press, New York (2007)
Hwang, Y., Kim, J., Seo, E.: Structure-aware web transcoding for mobile devices. IEEE Internet Computing 7(5), 14–21 (2003)
Lunn, D., Harper, S., Bechhofer, S.: Identifying behavioral strategies of visually impaired users to improve access to web content. ACM Trans. Access. Comput. 3(4), 13:1–13:35 (2011), http://doi.acm.org/10.1145/1952388.1952390
Mahmud, J.U., Borodin, Y., Ramakrishnan, I.V.: Csurf: a context-driven non-visual web-browser. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 31–40. ACM, New York (2007), http://doi.acm.org/10.1145/1242572.1242578
Michailidou, E.: ViCRAM: Visual Complexity Rankings and Accessibility Metrics. Ph.D. thesis (2010)
Milic-Frayling, N., Sommerer, R.: Smartview: Flexible viewing of web page contents. In: Poster Proceedings of the Eleventh International World Wide Web Conference (May 2002)
Song, R., Liu, H., Wen, J.R., Ma, W.Y.: Learning block importance models for web pages. In: Proceedings of the 13th International Conference on World Wide Web, WWW 2004, pp. 203–211. ACM, New York (2004), http://doi.acm.org/10.1145/988672.988700
Takagi, H., Asakawa, C., Fukuda, K., Maeda, J.: Site-wide annotation: Reconstructing existing pages to be accessible. In: ASSETS 2002, pp. 81–88. ACM Press (2002)
Whang, Y., Jung, C., Kim, J., Chung, S.: Webalchemist: A web transcoding system for mobile web access in handheld devices. In: Optoelectronic and Wireless Data Management, Processing, Storage, and Retrieval, pp. 102–109 (2001)
Xiang, P., Shi, Y.: Recovering semantic relations from web pages based on visual cues. In: Proceedings of the 11th International Conference on Intelligent User Interfaces, IUI 2006, pp. 342–344. ACM, New York (2006), http://doi.acm.org/10.1145/1111449.1111531
Xiao, X., Luo, Q., Hong, D., Fu, H.: Slicing*-tree based web page transformation for small displays. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, CIKM 2005, pp. 303–304. ACM, New York (2005), http://doi.acm.org/10.1145/1099554.1099638
Xiao, Y., Tao, Y., Li, Q.: Web page adaptation for mobile device. In: Wireless Communications, Networking and Mobile Computing (2008)
Xiao, Y., Tao, Y., Li, W.: A dynamic web page adaptation for mobile device based on web2.0. In: Proceedings of the 2008 Advanced Software Engineering and Its Applications, pp. 119–122. IEEE Computer Society, Washington, DC (2008), http://portal.acm.org/citation.cfm?id=1487741.1488145
Xie, X., Miao, G., Song, R., Wen, J.R., Ma, W.Y.: Efficient browsing of web search results on mobile devices based on block importance model. In: Proceedings of the Third IEEE International Conference on Pervasive Computing and Communications, pp. 17–26. IEEE Computer Society, Washington, DC (2005), http://portal.acm.org/citation.cfm?id=1048930.1049752
Yang, X., Shi, Y.: Enhanced gestalt theory guided web page segmentation for mobile browsing. In: Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 03, WI-IAT 2009, pp. 46–49. IEEE Computer Society, Washington, DC (2009), http://dx.doi.org/10.1109/WI-IAT.2009.227
Yesilada, Y., Chuter, A., Henry, S.L.: Shared Web Experiences: Barriers Common to Mobile Device Users and People with Disabilities. W3C (2008), http://www.w3.org/WAI/mobile/experiences
Yesilada, Y., Harper, S., Goble, C.A., Stevens, R.: Screen readers cannot see (ontology based semantic annotation for visually impaired web travellers). In: Koch, N., Fraternali, P., Wirsing, M. (eds.) ICWE 2004. LNCS, vol. 3140, pp. 445–458. Springer, Heidelberg (2004)
Yin, X., Lee, W.: Using link analysis to improve layout on mobile devices. In: Proceedings of the Thirteenth International World Wide Web Conference, pp. 338–344 (2004)
Yin, X., Lee, W.S.: Understanding the function of web elements for mobile content delivery using random walk models. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, WWW 2005, pp. 1150–1151. ACM, New York (2005), http://doi.acm.org/10.1145/1062745.1062913
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Akpınar, M.E., Yes̨ilada, Y. (2013). Vision Based Page Segmentation Algorithm: Extended and Perceived Success. In: Sheng, Q.Z., Kjeldskov, J. (eds) Current Trends in Web Engineering. ICWE 2013. Lecture Notes in Computer Science, vol 8295. Springer, Cham. https://doi.org/10.1007/978-3-319-04244-2_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-04244-2_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04243-5
Online ISBN: 978-3-319-04244-2
eBook Packages: Computer ScienceComputer Science (R0)