Keywords
- Basic Block
- Content Structure
- Semantic Structure
- Passage Retrieval
- Page Segmentation
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bharat, K. and Henzinger, M.R. (1998). Improved algorithms for topic distillation in a hyperlinked environment. Proceedings of SIGIR-98, Twenty-first ACM International Conference on Research and Development in Information Retrieval, Melbourne, Australia, pp. 104-111.
Kleinberg, J.M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46, pp. 604-632.
Chakrabarti, S. (2001). Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction. World Wide Web, pp. 211-220.
Chakrabarti, S., Joshi, M., and Tawde, V. (2001). Enhanced topic distillation using text, markup tags, and hyperlinks. Research and Development in Information Retrieval, pp. 208-216.
Chakrabarti, S., Punera, K., and Subramanyam, M. (2002). Accelerated focused crawling through online relevance feedback. WWW'02: Proceedings of the Eleventh International Conference on World Wide Web. New York, NY: ACM Press, pp. 148-159.
Kaasinen, E., Aaltonen, M., Kolari, J., Melakoski, S., and Laakko, T. (2000). Two approaches to bringing internet services to wap devices. Computer Networks, 33, pp. 231-246.
Callan, J. (1994). Passage-level evidence in document retrieval. In: Croft, W.B., van Rijsbergen, C. (Eds.): Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin, Ireland: Spring, pp. 302-310.
Salton, G., Allan, J., and Buckley, C. (1993). Approaches to passage retrieval in full text information systems. Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 49-58.
Wilkinson, R. (1994). Effective retrieval of structured documents. Research and Development in Information Retrieval, pp. 311-317.
Hearst, M. (1994). Multi-paragraph segmentation of expository text. Thirty-second Annual Meeting of the Association for Computational Linguistics. Las Cruces, New Mexico: New Mexico State University, pp. 9-16.
Ponte, J.M. and Croft, W.B. (1997). Text segmentation by topic. European Conference on Digital Libraries, pp. 113-125.
Kaszkiel, M. and Zobel, J. (2001). Effective ranking with arbitrary passages. Journal of the American Society of Information Science, 52, pp. 344-364.
Zobel, J., Moffat, A., Wilkinson, R., and Sacks-Davis, R. (1995). Efficient retrieval of partial documents. TREC-2: Proceedings of the Second Conference on Text Retrieval Conference. Elmsford, NY: Pergamon Press, pp. 361-377.
Kwok, K.L., Grunfeld, L., Dinstl, N., and Chan, M. (2000). Trec-9 cross language, web and question-answering track experiments using pircs. TREC.
Lin, S.H. and Ho, J.M. (2002). Discovering informative content blocks from web documents. KDD'02: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM Press, pp. 588-593.
Wong, W. and Fu, A. (2000). Finding structure and characteristics of web documents for classification. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp. 96-105.
Embley, D., Jiang, S., and Ng, Y. (1999). Record-Boundary Discovery In Web Documents.
Crivellari, F. and Melucci, M. (2001). Web document retrieval using passage retrieval, connectivity information, and automatic link weighting. TREC-9: Proceedings of the Ninth Text Retrieval Conference.
Chen, J., Zhou, B., Shi, J., Zhang, H., and Fengwu, Q. (2001). Function-based object model towards website adaptation. World Wide Web, pp. 587-596.
Cai, D., Yu, S., Wen, J.R., and Ma, W.Y. (2003). Vips: a vision-based page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79.
Cai, D., Yu, S., Wen, J.R., and Ma, W.Y. (2003). Extracting content structure for web pages based on visual representation. Proceedings of the Fifth Asia Pacific Web Conference, Xi'an, China.
ODP. Open directory project. http://dmoz.org/.
Mitchell, T. (1997). Machine Learning. New York: McGraw-Hill.
Mehta, R.R., Mitra, P., and Karnick, H. (2005). Extracting semantic structure of web documents using content and visual information. WWW'05: Proceedings of the Fourteenth International Conference World Wide Web, pp. 928-929.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag London Limited
About this chapter
Cite this chapter
Mehta, R.R., Karnick, H., Mitra, P. (2007). Semantic Structure Analysis of Web Documents. In: Chaudhuri, B.B. (eds) Digital Document Processing. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84628-726-8_19
Download citation
DOI: https://doi.org/10.1007/978-1-84628-726-8_19
Publisher Name: Springer, London
Print ISBN: 978-1-84628-501-1
Online ISBN: 978-1-84628-726-8
eBook Packages: Computer ScienceComputer Science (R0)
