Skip to main content

Web Mining in Soft Computing Framework: A Survey

  • Chapter
Fuzzy Logic and the Internet

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 137))

  • 239 Accesses

Summary

The chapter deals with use of different soft computing tools to achieve web intelligence. It summarizes different characteristics of web data, the basic components of web mining and its different types, and their current states of the art. The reason for considering web mining, a separate field from data mining, is explained. The limitations of some of the existing web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic, artificial neural networks, genetic algorithms and rough sets) highlighted. A survey of the existing literature on ‘soft web mining’ is provided along with the commercially available systems. The prospective areas of web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing ‘soft web mining’ systems is explained. An extensive bibliography is also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Banerjee, S. Mitra, and S. K. Pal. Rough fuzzy MLP: Knowledge encoding and classification. IEEE Transactions on Neural Networks, 9:1203–1216, 1998.

    Article  Google Scholar 

  2. D. Bikel, R. Schwartz, and R. Weischedel. An algorithm that learns what’s in a name. Machine learning, 34 (Special issue on Natural Language Learning)(1/3):211–231, 1999.

    Article  MATH  Google Scholar 

  3. M. Boughanem, C. Chrisment, J. Mothe, C. S. Dupuy, and L. Tamine. Connectionist and genetic approaches for information retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 102–121. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  4. M. Boughanem, T. Dkaki, J. Mothe, and C. Soule-Dupuy. Mercure at trec7. In Proceedings of the 7th International Conference on Text Retrieval, TREC7, Gaithrsburg, MD, 1998.

    Google Scholar 

  5. S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of Eighth International WWW Conference, pages 107–117, Brisbane, Australia, April 1998.

    Google Scholar 

  6. S. Chakrabarti. Data mining for hypertext. ACM SIGKDD Explorations,1(2):1–11, 2000.

    Article  Google Scholar 

  7. S. Chakrabarti, M. van den Berg, and B. Dom. Focused crawling: A new approach to topic-specific web resource discovery. In Proceedings of the 8th World Wide Web Conference, Toronto, May 1999.

    Google Scholar 

  8. H. Chen, Y. Chung, C. Yang, and M. Ramsey. A smart itsy bitsy spider for the web. Journal of the American Society for Information Science, 49(7):604–618, 1997.

    Article  Google Scholar 

  9. W. W. Cohen. What can we learn from the web? In Proceedings of 16th International Conference on Machine Learning (ICML99), pages 515–521, 1995.

    Google Scholar 

  10. R. Cooley, B. Mobasher, and J. Srivastava. Web mining:information and pattern discovery on the world wide web. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, Newport beach, CA, November 1997.

    Google Scholar 

  11. F. Crestani and G. Pasi, editors. Soft Computing in Information Retrieval: Techniques and Application, volume 50. Physica-Verlag, Heidelberg, 2000.

    Google Scholar 

  12. D. D. Cutting, J. Karger, J. Pederson, and J. Scatter. A cluster based approach to browsing large document collections. Proceedings of the Fifteenth International Conference on Research and Development in Information Retrieval, pages 318–329, June 1992.

    Google Scholar 

  13. C. Drummond, D. Ionescu, and R. Holte. A learning agent that assists the browsing of software libraries. Technical Report TR-95–12, University of Ottawa, 1995.

    Google Scholar 

  14. O. Etzioni. Moving up the information food cahin: Deploying softbots on the web. In Proceedings of the Fourteenth National Conference on AI, pages 1322–1326, Portland, OR, 1996.

    Google Scholar 

  15. O. Etzioni and M. Perkowitz. Adaptive web sites: An AI challenge. In Proceedings of Fifteenth National Conference on Artificial Intelligence, Madison, Wisconsin, July 1998.

    Google Scholar 

  16. O. Etzioni, J. Shakes, and M. Langheinrich. Ahoy! the homepage finder. In Proceedings of Sixth WWW Conference, Santa Carla, CA, April 1997.

    Google Scholar 

  17. O. Etzioni and O. Zamir. Web document clustering: A feasibility demonstration. In Proceedings of the 21st Annual International ACM SIGIR Conference, pages 46–54, 1998.

    Google Scholar 

  18. D. Freitag. Information extraction from html: Application of a general machine learning approach. In Proceeding of Fifteenth Conference on Artificial Intelligence AAAAI-98, pages 517–523, 1998.

    Google Scholar 

  19. D. Freitag and N. Kushmerick. Boosted wrapper induction. In Proceedings of AAAI, pages 577–583, 2000.

    Google Scholar 

  20. D. Freitag and A. McCallum. Information extraction from hmm’s and shrinkage. In Proceedings of AAAI-99 Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999.

    Google Scholar 

  21. H. Fukuda, E.L.P. Passos, A. M. Pacheco, L. B. Neto, J. Valerio, V. Jr. De Roberto, E. R. Antonio, and L. Chigener. Web text mining using a hybrid system. In Proceedings of the Sixth Brazilian Symposium on Neural Networks, pages 131–136, 2000.

    Chapter  Google Scholar 

  22. T. Gedeon and L. Koczy. A model of intelligent information retrieval using fuzzy tolerance relations based on hierarchical co-occurrence of words. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 48–74. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  23. R. Ghani, R. Jones, D. Mladenic, K. Nigam, and S. Slattery. Data mining on symbolic knowledge extracted from the web. In Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining Boston, MA, pages 29–36, August 2000.

    Google Scholar 

  24. D. Gibson. Inferring web communities from link topologies. In UK conference on Hypertext, 1998.

    Google Scholar 

  25. M. D. Gordon. Probabilistic and genetic algorithms for document retrieval. Communications of the ACM, 31(10):208–218, 1988.

    Article  Google Scholar 

  26. A. Gyenesei. A fuzzy approach for mining quantitative association rules. TUCS technical reports 336, University of turku, Department of Computer Science, Lemminkisenkatu14, Finland, March 2000.

    Google Scholar 

  27. J. Hipp, U. Guntzer, and J. Nakhaeizadeh. Algorithms for association rule mining- a general survey and comparison. ACM SIGKDD Eplorations, 2(1):58–65, July 2000.

    Article  Google Scholar 

  28. A. Joshi and R. Krishnapuram. Robust fuzzy clustering methods to support web mining. In Proc Workshop in Data Mining and Knowledge Discovery, SIGMOD, pages 15–1 to 15–8, 1998.

    Google Scholar 

  29. H. Kargupta. The gene expression messy genetic algorithm. In Proceedings of the IEEE International Conference on Evolutionary Computation, pages 631–636, Nagoya University, Japan, 1996.

    Chapter  Google Scholar 

  30. H. Kargupta, B. H. Park, D. Hershberger, and E. Johnson. Collective data mining: A new perspective toward distributed data mining. Advances in Distributed and Parallel Knowledge Discovery, 1999. MIT/AAAI Press.

    Google Scholar 

  31. S. Kawasaki, N. Binh Nguyen, and T. Bao Ho. Hierarchical document clustering based on tolerance rough set model. In Proceedings of the Sixth International

    Google Scholar 

  32. Conference on Knowledge Discovery and Data Mining (KDD-2000) Workshop on Text Mining Boston, MA , August 2000.

    Google Scholar 

  33. S. Kim and B. Thak Zhang. Web document retrieval by genetic learning of importance factors for html tags. In Proceedings of the International Workshop on Text and Web mining, pages 13–23, Melbourne, Australia, August 2000.

    Google Scholar 

  34. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.

    Article  MathSciNet  MATH  Google Scholar 

  35. T. Kohonen. Self-organising Maps. Springer, Berlin, Germany, second edition, 1997.

    Book  Google Scholar 

  36. T. Kohonen. Self organizing maps for large documents. IEEE Transactions on Neural networks, 11 (Special issue on Data Mining)(3):574–589, June 2000.

    Article  Google Scholar 

  37. R. Kosla and H. Blockeel. Web mining research:a survey. SIG KDD Explorations, 2:1–15, July 2000.

    Article  Google Scholar 

  38. D. H. Kraft, F. E. Petry, B. P. Buckles, and T. Sadasivan. The use of genetic programming to build queries for information retrieval. In Proceedings of the IEEE Symposium on Evolutionary Computation, Orlando, FL, 1994.

    Google Scholar 

  39. R. Krishnapuram, A. Joshi, and L. Yi. A fuzzy relative of the k-medoids algorithm with application to document and snippet clustering. In Proceedings of IEEE Intl. Conf. Fuzzy Systems — FUZZIEEE 99, Korea, 1999.

    Google Scholar 

  40. C.-H. Lee and H.-C. Yang. Developing an adaptive search engine for e-commerce using a web mining approach. In Proceedings of the International Conference on Information Technology: Coding and Computing, pages 604–608, 2001.

    Google Scholar 

  41. A.Y. Levy and D.S. Weld. Intelligent internet systems. Artificial Intelligence, 118(1–2), 2000.

    Google Scholar 

  42. J. H. Lim. Visual keywords: from text retrieval to multimedia retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 77–101. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  43. W. Y. Lin, S. A. Alvarez, and C. Ruiz. Collaborative recommendation via adaptive association rule mining, August 2000.

    Google Scholar 

  44. V. Loia and P. Luongo. An evolutionary approach to automatic web page categorization and updating. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 292–302. Springer Verlag, Singapore, 2001.

    Chapter  Google Scholar 

  45. V. Uma Maheswari, A. Siromoney, and K. M. Mehata. The variable precision rough set model for web usage mining. In Proceedings of the First Asia-Pacific Conference on Web Intelligence (WI-2001), Maebashi, Japan, October 2001.

    Google Scholar 

  46. M.J. Martin-Bautista and M.-A. Vila. A survey of genetic feature selection in mining issues. In Proceedings of the Congress on Evolutionary Computation (CEC 99), pages 13–23, 1999.

    Google Scholar 

  47. D. Merkl and A. Rauber. Document classification with unsupervised artificial neural networks. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 102–121. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  48. S. Mitra and S. K. Pal. Fuzzy multi-layer perceptron, inferencing and rule generation. IEEE Transactions on Neural Networks, 6:51–63, 1995.

    Article  Google Scholar 

  49. D. Mladenic and M. Grobelnik. Efficient text categorization. In Proceedings of Text Mining Workshop on the 10th European Conference on Machine Learning ECML98, 1998.

    Google Scholar 

  50. B. Mobasher, H. Dai, T. Luo, M. Nakagawa, Y. Sun, and J. Wiltshire. Discovery of aggregate usage profiles for web personalization. In Proceedings of KDD-2000 Workshop on Web Mining for E-Commerce, Boston, MA, August 2000.

    Google Scholar 

  51. B. Mobasher, N. Jain, E-Hong(Sam) Han, and J. Srivastava. Web mining: Patterns from from WWW transactions. Technical Report TR96–050, Department of Computer Science,University of Minnesota, March 1997.

    Google Scholar 

  52. B. Mobasher, V. Kumar, and E. H. Han. Clustering in a high dimensional space using hypergraph models. Technical Report TR-97–063, University of Minnesota, Minneapolis, 1997.

    Google Scholar 

  53. D. Nauck. Using symbolic data in neuro-fuzy classification. In Proceedings of NAFIPS’99, New York, USA, pages 536–540, June 1999.

    Google Scholar 

  54. C. V. Negotia. On the notion of relevance in information retrieval. Kybernetes, 2(3):161–165, 1973.

    Article  Google Scholar 

  55. S. K. Pal, T. S. Dillon, and D. S. Yeung, editors. Soft computing in Case Based Reasoning. Springer-Verlag, London, 2000.

    Google Scholar 

  56. S. K. Pal, A. Ghosh, and M. K. Kundu, editors. Soft Computing for Image Processing. Physica Verlag, Heidelberg, 2000.

    MATH  Google Scholar 

  57. S. K. Pal and S. Mitra. Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing. John Wiley, New York, 1999.

    Google Scholar 

  58. S. K. Pal, S. Mitra, and P. Mitra. Rough fuzzy MLP: Modular evolution, rule generation and evaluation. IEEE Transactions on Knowledge and Data Engineering, to appear, 2001.

    Google Scholar 

  59. S. K. Pal and A. Skowron. Rough Fuzzy Hybridization: A New Trend in Decision Making. Springer-Verlag, Singapore, 1999.

    MATH  Google Scholar 

  60. S. K. Pal, V. Talwar, and P. Mitra. Web mining in soft computing framework: Relevance, state of the art and future directions. IEEE Trans. Neural Networks, 13(5):1163–1177, 2002.

    Article  Google Scholar 

  61. J. Pazzani and D. Billsus. Learning collaborative information filters. In Proceedings of Fifteenth International Conference on Machine Learning, Madison, Wisc, 1998. Morgan Kauffman.

    Google Scholar 

  62. M. Pazzani, J. Muramatsu, and D. Billsus. Syskill and webert:identifying interesting web sites. In Proceedings of Thirteenth National Conference on AI, pages 54–61, 1996.

    Google Scholar 

  63. F. Picarougne, N. Monmarche, A. Oliver, and G. Venturini. Web mining with a genetic algorithm. In Proceedings of the Eleventh International World Wide Web Conference, Hawaii, 2002.

    Google Scholar 

  64. J. Pitkow. In search of reliable usage data on the www. In Proceedings of the Sixth International WWW conference, pages 451–463, Santa Carla, CA, 1997.

    Google Scholar 

  65. L. Polkowski and A. Skowron. Rough mereology: A new paradigm for approximate reasoning. International Journal of Approximate Reasoning, 15(4):333–365, 1996.

    Article  MathSciNet  MATH  Google Scholar 

  66. J. Shavlik and T. Eliassi. A system for building intelligent agents that learn to retrieve and extract information. International Journal on User Modeling and user adapted interaction, April (Special issue on User Modeling and Intelligent Agents 2001.

    Google Scholar 

  67. J. Shavlik and G. G. Towell. Knowledge-based artificial neural networks. Artificial Intelligence, 70(1/2):119–165, 1994.

    MATH  Google Scholar 

  68. A. Skowron and L. Polkowski, editors. Rough Sets in Knowledge Discovery. Physica-Verlag, Heidelberg, 1998.

    Google Scholar 

  69. S. Mitra, S. K. Pal, and P. Mitra. Data mining in soft computing framework: A survey. IEEE Transactions on Neural Networks, 13(1)3–14, 2002.

    Article  Google Scholar 

  70. S. Soderland. Learning information extraction rules for semi-structured and free text. Machine learning, 34 (Special issue on Natural Language Learning)(1/3):233–272, 1999.

    Article  MATH  Google Scholar 

  71. U. Straccia. A framework for the retrieval of multimedia objects based on four-valued fuzzy description logics. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 332–357. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  72. C. Wan, M. Liu, and L. Wang. Content-based sound retrieval for web application. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 389–393. Springer Verlag, Singapore, 2001.

    Chapter  Google Scholar 

  73. S. K. Wong, Y. Y. Yao, and C J. Butz. Granular information retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 317–331. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  74. R. Yager. A framework for linguistic and hierarchical queries for document retrieval. In F. Crestani and G. Pasi, editors, Soft Computing in Information Retrieval: Techniques and Applications, volume 50, pages 3–20. Physica Verlag, Heidelberg, 2000.

    Google Scholar 

  75. K. Yanai, M. Shindo, and K. Noshita. A fast image-gathering system on the world wide web using a PC cluster. In N. Zhong, Y. Yao, J. Liu, and S. Oshuga, editors, Web Intelligence: Research and Developement, volume LNCS 2198, pages 324–334. Springer Verlag, Singapore, 2001.

    Chapter  Google Scholar 

  76. J. J. Yang and R. Korfiage. Query modification using genetic algorithms in vector space models. TR LIS045/I592001, Department of IS, University of Pittsburg, 1992.

    Google Scholar 

  77. L. A. Zadeh. Fuzzy logic, neural networks, and soft computing. Communications of the ACM, 37:77–84, 1994.

    Article  Google Scholar 

  78. L. A. Zadeh. A new direction in AI: Towards a computational theory of perceptions. AI magazine, 22:73–84, 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Pal, S.K., Talwar, V., Mitra, P. (2004). Web Mining in Soft Computing Framework: A Survey. In: Loia, V., Nikravesh, M., Zadeh, L.A. (eds) Fuzzy Logic and the Internet. Studies in Fuzziness and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39988-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39988-9_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05770-0

  • Online ISBN: 978-3-540-39988-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics