Abstract
Catalogues play an important role in most of the current Web search engines. The catalogues, which organize documents into hierarchical collections, are maintained manually increasing difficult y and costs due to the incessant growing of the WWW. This problem has stimulated many researches to work on automatic categorization of Web documents. In reality, most of these approaches work well either on special types of documents or on restricted set of documents. This paper presents an evolutionary approach useful to construct automatically the catalogue as well as to perform the classification of a Web document. This functionality relies on a genetic-based fuzzy clustering methodology that applies the clustering on the context of the document, as opposite to content based clustering that works on the complete document information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Attardi, G., Di Marco S., and Salvi, D. (1998). Categorisation by Context. Journal of Universal Compouter Science, 4:719–736.
Boley, D., Gini., M., Gross, R., Hang, E-H., Hasting, K., Karypis, G., Kumar, V., Mobasher, B., and Moore, J. (1999). Partioning-based clustering for Web document categorization Decision Support System, 27 (1999) 329–341.
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Rahavan, P., and Rajagopalan, S. (1998). Automatic resource list compilation by analyzingh yperlink structure and associated text. Seventh International World Wide Web Conference, 1998.
Chang, C-H., and Hsu, C-C. (1997). Customizable Multi-Engine Search tool with Clustering. Sixth International World Wide Web Conference, April 7-11, 1997 Santa Clara, California, USA.
Cohen, W. (1998). A web-based information system that reasons with structured collections of text. Agents’98, 1998.
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T., Nigam, K., and Slattery, S. (1998). Learningto extract symbolic knowledge from the World Wide Web. AAAI-98, 1998.
Hayes, J., and Weinstein, S. P. (1990). CONSTRUE-TIS: A system for contentbased indexingof a database of news stories. Second Annual Conference on Innovative Applications of Artificial Intelligence, 1–5.
Iwayama, M. (1995). Cluster-based text categorization: a comparison of category search strategies. SIGIR-95, pp. 273–280.
Open Directory Project. URL: http://www.dmoz.org/about.html
Lawrence, S. and Giles, C. L. (1999). Nature, 400:107–109.Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99).
Mase, H., Tsuji, H., Kinukawa, H., Hosoya, Y., Koutani, K., and Kiyota, K. (1996). Experimental simulation for automatic patent categorization. Advances in Production Management Systems, 377–382.
McCallum, A., Nigam, K., Rennie, J., and Seymore, K. (1999). A Machine Learning Approach to BuildingDomain-Sp ecific Search Engine. Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99).
Sahami, M., Yusufali, S., and Baldoando, M. Q., W. (1998) SONIA: A service for organizing networked information autonomously. Third ACM Conference on Digital Libraries.
Selberg, E. (1999) Towards Comprehensive Web Search. PhD thesis, University of Washington.
Selberg, E and Etzioni, O. (2000). On the Instability of Web Search Engine. RIAO 2000.
JDK Java 2 Sun. http://www.java.sun.com
Zamir, O., and Etzioni, O. (1988).Web Document Clustering: A Feasibility Demonstration. SIGIR’98, Melbourne, Australia, ACM Press.
A Lexical Database for English. URL: http://www.cogsci.princeton.edu/wn/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Loia, V., Luongo, P. (2001). An Evolutionary Approach to Automatic Web Page Categorization and Updating. In: Zhong, N., Yao, Y., Liu, J., Ohsuga, S. (eds) Web Intelligence: Research and Development. WI 2001. Lecture Notes in Computer Science(), vol 2198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45490-X_35
Download citation
DOI: https://doi.org/10.1007/3-540-45490-X_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42730-8
Online ISBN: 978-3-540-45490-8
eBook Packages: Springer Book Archive