Summary
The chapter deals with use of different soft computing tools to achieve web intelligence. It summarizes different characteristics of web data, the basic components of web mining and its different types, and their current states of the art. The reason for considering web mining, a separate field from data mining, is explained. The limitations of some of the existing web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic, artificial neural networks, genetic algorithms and rough sets) highlighted. A survey of the existing literature on ‘soft web mining’ is provided along with the commercially available systems. The prospective areas of web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing ‘soft web mining’ systems is explained. An extensive bibliography is also provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Banerjee, S. Mitra, and S. K. Pal. Rough fuzzy MLP: Knowledge encoding and classification. IEEE Transactions on Neural Networks, 9:1203–1216, 1998.
D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34 (Special issue on Natural Language Learning)(1/3):211–231, 1999.
M. Boughanem, C. Chrisment, J. Mothe, C. S. Dupuy, and L. Tamine. Connectionist and genetic approaches for information retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 102–121. Physica Verlag, Heidelberg, 2000.
M. Boughanem, T. Dkaki, J. Mothe, and C. Soule-Dupuy. Mercure at trec7. In Proceedings of the 7th International Conference on Text Retrieval, TREC7, Gaithrsburg, MD, 1998.
S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of Eighth International WWW Conference, pages 107–117, Brisbane, Australia, April 1998.
S. Chakrabarti. Data mining for hypertext. ACM SIGKDD Explorations,1(2):1–11, 2000.
S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the 8th World Wide Web Conference, Toronto, May 1999.
H. Chen, Y. Chung, C. Yang, and M. Ramsey. A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, 49(7):604–618, 1997.
W. W. Cohen. What can we learn from the web? In Proceedings of 16th International Conference on Machine Learning (ICML99), pages 515–521, 1995.
R. Cooley, B. Mobasher, and J. Srivastava. Web mining:information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport beach, CA, November 1997.
F. Crestani and G. Pasi, editors. Soft Computing in Information Retrieval: Techniques and Application, volume 50. Physica-Verlag, Heidelberg, 2000.
D. D. Cutting, J. Karger, J. Pederson, and J. Scatter. A cluster based approach to browsing large document collections. Proceedings of the Fifteenth International Conference on Research and Development in Information Retrieval, pages 318–329, June 1992.
C. Drummond, D. Ionescu, and R. Holte. A learning agent that assists the browsing of software libraries. Technical Report TR-95–12, University of Ottawa, 1995.
O. Etzioni. Moving up the information food cahin: Deploying softbots on the web. In Proceedings of the Fourteenth National Conference on AI, pages 1322–1326, Portland, OR, 1996.
O. Etzioni and M. Perkowitz. Adaptive web sites: An AI challenge. In Proceedings of Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin, July 1998.
O. Etzioni, J. Shakes, and M. Langheinrich. Ahoy! the homepage finder. In Proceedings of Sixth WWW Conference, Santa Carla, CA, April 1997.
O. Etzioni and O. Zamir. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference, pages 46–54, 1998.
D. Freitag. Information extraction from html: Application of a general machine learning approach. In Proceeding of Fifteenth Conference on Artificial Intelligence AAAAI-98, pages 517–523, 1998.
D. Freitag and N. Kushmerick. Boosted wrapper induction. In Proceedings of AAAI, pages 577–583, 2000.
D. Freitag and A. McCallum. Information extraction from hmm’s and shrinkage. In Proceedings of AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999.
H. Fukuda, E.L.P. Passos, A. M. Pacheco, L. B. Neto, J. Valerio, V. Jr. De Roberto, E. R. Antonio, and L. Chigener. Web text mining using a hybrid system. In Proceedings of the Sixth Brazilian Symposium on Neural Networks, pages 131–136, 2000.
T. Gedeon and L. Koczy. A model of intelligent information retrieval using fuzzy tolerance relations based on hierarchical co-occurrence of words. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 48–74. Physica Verlag, Heidelberg, 2000.
R. Ghani, R. Jones, D. Mladenic, K. Nigam, and S. Slattery. Data mining on symbolic knowledge extracted from the web. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining Boston, MA, pages 29–36, August 2000.
D. Gibson. Inferring web communities from link topologies. In UK conference on Hypertext, 1998.
M. D. Gordon. Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31(10):208–218, 1988.
A. Gyenesei. A fuzzy approach for mining quantitative association rules. TUCS technical reports 336, University of turku, Department of Computer Science, Lemminkisenkatu14, Finland, March 2000.
J. Hipp, U. Guntzer, and J. Nakhaeizadeh. Algorithms for association rule mining- a general survey and comparison. ACM SIGKDD Eplorations, 2(1):58–65, July 2000.
A. Joshi and R. Krishnapuram. Robust fuzzy clustering methods to support web mining. In Proc Workshop in Data Mining and Knowledge Discovery, SIGMOD, pages 15–1 to 15–8, 1998.
H. Kargupta. The gene expression messy genetic algorithm. In Proceedings of the IEEE International Conference on Evolutionary Computation, pages 631–636, Nagoya University, Japan, 1996.
H. Kargupta, B. H. Park, D. Hershberger, and E. Johnson. Collective data mining: A new perspective toward distributed data mining. Advances in Distributed and Parallel Knowledge Discovery, 1999. MIT/AAAI Press.
S. Kawasaki, N. Binh Nguyen, and T. Bao Ho. Hierarchical document clustering based on tolerance rough set model. In Proceedings of the Sixth International
Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining Boston, MA , August 2000.
S. Kim and B. Thak Zhang. Web document retrieval by genetic learning of importance factors for html tags. In Proceedings of the International Workshop on Text and Web mining, pages 13–23, Melbourne, Australia, August 2000.
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
T. Kohonen. Self-organising Maps. Springer, Berlin, Germany, second edition, 1997.
T. Kohonen. Self organizing maps for large documents. IEEE Transactions on Neural networks, 11 (Special issue on Data Mining)(3):574–589, June 2000.
R. Kosla and H. Blockeel. Web mining research:a survey. SIG KDD Explorations, 2:1–15, July 2000.
D. H. Kraft, F. E. Petry, B. P. Buckles, and T. Sadasivan. The use of genetic programming to build queries for information retrieval. In Proceedings of the IEEE Symposium on Evolutionary Computation, Orlando, FL, 1994.
R. Krishnapuram, A. Joshi, and L. Yi. A fuzzy relative of the k-medoids algorithm with application to document and snippet clustering. In Proceedings of IEEE Intl. Conf. Fuzzy Systems — FUZZIEEE 99, Korea, 1999.
C.-H. Lee and H.-C. Yang. Developing an adaptive search engine for e-commerce using a web mining approach. In Proceedings of the International Conference on Information Technology: Coding and Computing, pages 604–608, 2001.
A.Y. Levy and D.S. Weld. Intelligent internet systems. Artificial Intelligence, 118(1–2), 2000.
J. H. Lim. Visual keywords: from text retrieval to multimedia retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 77–101. Physica Verlag, Heidelberg, 2000.
W. Y. Lin, S. A. Alvarez, and C. Ruiz. Collaborative recommendation via adaptive association rule mining, August 2000.
V. Loia and P. Luongo. An evolutionary approach to automatic web page categorization and updating. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 292–302. Springer Verlag, Singapore, 2001.
V. Uma Maheswari, A. Siromoney, and K. M. Mehata. The variable precision rough set model for web usage mining. In Proceedings of the First Asia-Pacific Conference on Web Intelligence (WI-2001), Maebashi, Japan, October 2001.
M.J. Martin-Bautista and M.-A. Vila. A survey of genetic feature selection in mining issues. In Proceedings of the Congress on Evolutionary Computation (CEC 99), pages 13–23, 1999.
D. Merkl and A. Rauber. Document classification with unsupervised artificial neural networks. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 102–121. Physica Verlag, Heidelberg, 2000.
S. Mitra and S. K. Pal. Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 6:51–63, 1995.
D. Mladenic and M. Grobelnik. Efficient text categorization. In Proceedings of Text Mining Workshop on the 10th European Conference on Machine Learning ECML98, 1998.
B. Mobasher, H. Dai, T. Luo, M. Nakagawa, Y. Sun, and J. Wiltshire. Discovery of aggregate usage profiles for web personalization. In Proceedings of KDD-2000 Workshop on Web Mining for E-Commerce, Boston, MA, August 2000.
B. Mobasher, N. Jain, E-Hong(Sam) Han, and J. Srivastava. Web mining: Patterns from from WWW transactions. Technical Report TR96–050, Department of Computer Science,University of Minnesota, March 1997.
B. Mobasher, V. Kumar, and E. H. Han. Clustering in a high dimensional space using hypergraph models. Technical Report TR-97–063, University of Minnesota, Minneapolis, 1997.
D. Nauck. Using symbolic data in neuro-fuzy classification. In Proceedings of NAFIPS’99, New York, USA, pages 536–540, June 1999.
C. V. Negotia. On the notion of relevance in information retrieval. Kybernetes, 2(3):161–165, 1973.
S. K. Pal, T. S. Dillon, and D. S. Yeung, editors. Soft computing in Case Based Reasoning. Springer-Verlag, London, 2000.
S. K. Pal, A. Ghosh, and M. K. Kundu, editors. Soft Computing for Image Processing. Physica Verlag, Heidelberg, 2000.
S. K. Pal and S. Mitra. Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. John Wiley, New York, 1999.
S. K. Pal, S. Mitra, and P. Mitra. Rough fuzzy MLP: Modular evolution, rule generation and evaluation. IEEE Transactions on Knowledge and Data Engineering, to appear, 2001.
S. K. Pal and A. Skowron. Rough Fuzzy Hybridization: A New Trend in Decision Making. Springer-Verlag, Singapore, 1999.
S. K. Pal, V. Talwar, and P. Mitra. Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Trans. Neural Networks, 13(5):1163–1177, 2002.
J. Pazzani and D. Billsus. Learning collaborative information filters. In Proceedings of Fifteenth International Conference on Machine Learning, Madison, Wisc, 1998. Morgan Kauffman.
M. Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert:identifying interesting web sites. In Proceedings of Thirteenth National Conference on AI, pages 54–61, 1996.
F. Picarougne, N. Monmarche, A. Oliver, and G. Venturini. Web mining with a genetic algorithm. In Proceedings of the Eleventh International World Wide Web Conference, Hawaii, 2002.
J. Pitkow. In search of reliable usage data on the www. In Proceedings of the Sixth International WWW conference, pages 451–463, Santa Carla, CA, 1997.
L. Polkowski and A. Skowron. Rough mereology: A new paradigm for approximate reasoning. International Journal of Approximate Reasoning, 15(4):333–365, 1996.
J. Shavlik and T. Eliassi. A system for building intelligent agents that learn to retrieve and extract information. International Journal on User Modeling and user adapted interaction, April (Special issue on User Modeling and Intelligent Agents 2001.
J. Shavlik and G. G. Towell. Knowledge-based artificial neural networks. Artificial Intelligence, 70(1/2):119–165, 1994.
A. Skowron and L. Polkowski, editors. Rough Sets in Knowledge Discovery. Physica-Verlag, Heidelberg, 1998.
S. Mitra, S. K. Pal, and P. Mitra. Data mining in soft computing framework: A survey. IEEE Transactions on Neural Networks, 13(1)3–14, 2002.
S. Soderland. Learning information extraction rules for semi-structured and free text. Machine learning, 34 (Special issue on Natural Language Learning)(1/3):233–272, 1999.
U. Straccia. A framework for the retrieval of multimedia objects based on four-valued fuzzy description logics. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 332–357. Physica Verlag, Heidelberg, 2000.
C. Wan, M. Liu, and L. Wang. Content-based sound retrieval for web application. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 389–393. Springer Verlag, Singapore, 2001.
S. K. Wong, Y. Y. Yao, and C J. Butz. Granular information retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 317–331. Physica Verlag, Heidelberg, 2000.
R. Yager. A framework for linguistic and hierarchical queries for document retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 3–20. Physica Verlag, Heidelberg, 2000.
K. Yanai, M. Shindo, and K. Noshita. A fast image-gathering system on the world wide web using a PC cluster. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 324–334. Springer Verlag, Singapore, 2001.
J. J. Yang and R. Korfiage. Query modification using genetic algorithms in vector space models. TR LIS045/I592001, Department of IS, University of Pittsburg, 1992.
L. A. Zadeh. Fuzzy logic, neural networks, and soft computing. Communications of the ACM, 37:77–84, 1994.
L. A. Zadeh. A new direction in AI: Towards a computational theory of perceptions. AI magazine, 22:73–84, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Pal, S.K., Talwar, V., Mitra, P. (2004). Web Mining in Soft Computing Framework: A Survey. In: Loia, V., Nikravesh, M., Zadeh, L.A. (eds) Fuzzy Logic and the Internet. Studies in Fuzziness and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39988-9_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-39988-9_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05770-0
Online ISBN: 978-3-540-39988-9
eBook Packages: Springer Book Archive