Skip to main content

Abstract

The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning Web data and documents. This chapter provides a brief overview of web mining techniques and research areas, most notably hypertext classification, wrapper induction, recommender systems and web usage mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • R. Albert, H. Jeong, and A.-L. Barabási. Diameter of the world-wide web. Nature, 401:130–131, September 1999.

    Article  Google Scholar 

  • I. Androutsopoulos, G. Paliouras, and E. Michelakis. Learning to filter unsolicited commercial e-mail. Technical Report 2004/2, NCSR Demokritos, March 2004.

    Google Scholar 

  • R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell. WebWatcher: A learning apprentice for the workl wide web. In C. Knoblock and A. Levy, editors, Proceedings of AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, pages 6–12. AAAI Press, 1995. Technical Report SS-95-08.

    Google Scholar 

  • M. Balabanovi and Y. Shoham. Learning information retrieval agenls: Experiments with automated web browsing. In C. Knoblock and A. Levy, editors, Proceeding of AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, pages 13–18. AAAI Press, 1995. Technical Report SS-95-O8.

    Google Scholar 

  • C. Basu, H. Hirsh, W. W. Cohen, and C. Nevill-Manning. Technical paper recommendation: A study in combining multiple information sources. Journal of Artificial Intelligence Research, 14: 231–252, 2001.

    Google Scholar 

  • B. Berendt. Using site semantics to analyze, visualize, and support navigation. Data Mining and Knowledge Discovery, 6(1): 37–59, 2002.

    Article  MathSciNet  Google Scholar 

  • B. Berendt, A. Hotho, and G. Stumme. Towards semantic web mining. In I. Horrocks and J. Hendler, editors, Proceedings of the 1st International Semantic Web Conference (ISWC-02), pages 264–278. Springer-Verlag, 2002.

    Google Scholar 

  • T. Berners-Lee, R. Cailliau, A. Loutonen, H. Nielsen, and A. Secret. The World Wide Web. Communications of the ACM, 37(8):76–82, 1994.

    Article  Google Scholar 

  • T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, May 2001.

    Google Scholar 

  • K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public web search engines. Computer Networks, 30(1–7):107–117, 1998. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.

    Google Scholar 

  • K. Bharat, A. Broder, M. R. Henzinger. P. Kumar, and S. Venkatasubramanian. The connectivity server: Fast access to linkage information on the Web. Computer Networks, 30(1–7):469–477, 1998. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.

    Google Scholar 

  • K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-98), pages 104–111, 1998.

    Google Scholar 

  • J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysis of predictive algorithms for collaborative filtering. In G. F. Cooper and S. Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43–52, Madison, WI, 1998. Morgan Kaufmann.

    Google Scholar 

  • S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks, 30(1–7):117–117, 1998. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.

    Google Scholar 

  • A. Broder, R. Kumar, F. Maghoul. P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the Web. Computer Networks, 33(1–6):3O9–320, 2000. Proceedings of the 9th International World Wide Web Conference (WWW-9).

    Google Scholar 

  • R. D. Burke, K. J. Hammond, V. Kulyukin, S. L. Lytinen, N. Tomuro, and S. Scott Schoenberg. Frequently-asked question files: Experiences with the FAQ finder system. AI Magazine, 18(2):57–66, 1997.

    Google Scholar 

  • R. D. Burke, K. J. Hammond, and B. C. Young. Knowledge-based navigation of complex information spaces. In Proceedings of 13th National Conference on Artificial Intelligence (AAAI-96), pages 462–68. AAAI Press, 1996.

    Google Scholar 

  • M. E. Califf, editor. Machine Learning for Information Extraction: Proceedings of the AAAI-99 Workshop, 1999. AAAI Press. Technical Report WS-99-11.

    Google Scholar 

  • M. E. Califf. Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research, 4:177–210, 2003.

    Article  MathSciNet  Google Scholar 

  • S. Chakrabarti. Data Mining for hypertext: A tutorial survey. SIGKDD explorations, 1(2):1–11, January 2000.

    Google Scholar 

  • S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi-Structured Data. Morgan Kaufmann, 2002.

    Google Scholar 

  • S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of the ACM SIGMOD International Conference on Management on Data, pages 307–318, Seattle, WA, 1998a. ACM Press.

    Google Scholar 

  • S. Chakrabarti, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson, and J. Kleinberg. Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks, 30(1–7):65–74, 1998b. Proceedings of the 7th International World Wide Web Conference (WWW-7), Brisbane, Australia.

    Google Scholar 

  • G. Chang, M. J. Healy, J. A. M. McHugh, and J. T. L. Wang. Mining the World Wide Web: An Information Search Approach. Kluwer Academic Publishers, 2001.

    Google Scholar 

  • W. W. Cohen. Learning rules that classify e-mail. In M. Hearst and H. Hirsh, editors, Proceedings of the AAAI Spring Symposium on Machine Learning in Information Access, pages 18–25. AAAI Press, 1996. Technical Report SS-96-05.

    Google Scholar 

  • W. W. Cohen and W. Fan. Web-collaborative filtering: Recommending music by crawling the web. In Proceedings of the 9th International World Wide Web Conference (WWW-9), 2000.

    Google Scholar 

  • R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5–32, 1999.

    Google Scholar 

  • M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to construct knowledge bases from the World Wide Web. Artificial Intelligence, 118(1–2):69–114, 2000.

    Article  Google Scholar 

  • M. Craven and S. Slattery. Relational learning with statistical predicate invention: Better models for hypertext. Machine Learning, 43(1–2):97–119, 2001.

    Article  Google Scholar 

  • M. Craven, S. Slattery, and K. Nigam. First-order learning for Web mining. In C. Nédellec and C. Rouveirol, editors, Proceedings of the 10th European Conference on Machine Learning (ECML-98), pages 250–255, Chemnitz, Germany, 1998. Springer-Verlag.

    Google Scholar 

  • E. Crawford, J. Kay, and E. MeCreath. IEMS — The Intelligent Email Sorter. In C. Sammut and A. G. Hoffmann, editors, Proceedings of the 19th International Conference on Machine Learning (ICML-02), pages 263–272, Sydney, Australia, 2002. Morgan Kaufmann.

    Google Scholar 

  • J. Dean and M. R. Henzinger. Finding related pages in the World Wide Web. In A. Mendelzon, editor, Proceedings of the 8th International World Wide Web Conference (WWW-8), pages 389–401, Toronto, Canada, 1999.

    Google Scholar 

  • S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Fumas, and R. A. Harshman. Indexing by latent semantic analysis. Journal of the American Society of information Science, 41(6):391–407, 1990.

    Article  Google Scholar 

  • T. G. Diettcrich. Ensemble methods in machine learning. In J. Kittler and F. RoH, editors, First International Workshop on Multiple Classifier Systems, pages 1–15. Springer-Verlag, 2000.

    Google Scholar 

  • A. Doan, J. Madhavan, R. Dhamankar, P. Domingos, and A. Y. Halevy. Learning to match ontologies. VLDB Journal, 12(4):303–319, 2003. Special Issue on Ihe Semantic Web.

    Article  Google Scholar 

  • R. B. Doorenbos, O. Etzioni, and D. S. Weld. A scalable comparison-shopping agent for the World-Wide Web. In Proceedings of the 1st International Conference on Autonomous Agents, pages 39–48, Marina del Rey, CA, 1997.

    Google Scholar 

  • S. Džeroski and N. Lavrač, editors. Relational Data Mining: Inductive Logic Programming for Knowledge Discovery in Databases. Springer-Verlag, 2001.

    Google Scholar 

  • L. Eikvil. Information extraction from world wide web — a survey. Technical Report 945, Norwegian Computing Center, 1999.

    Google Scholar 

  • O. Etzioni and D. Weld. A softbot-based interface to the internet. Communications of the ACM, 37(7):72–76, July 1994. Special Issue on Intelligent Agents.

    Article  Google Scholar 

  • O. Etzioni. Moving up the information food chain: Deploying softbots on the world wide web. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1322–1326. AAAI Press, 1996.

    Google Scholar 

  • M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM-99), pages 251–262, Cambridge. MA. 1999. ACM Press.

    Google Scholar 

  • T. Fawcett. “In vivo” spam filtering: A challenge problem for Data Mining. SIGKDD explorations, 5(2). December 2003.

    Google Scholar 

  • D. Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag, Berlin, 2001.

    Google Scholar 

  • D. Freitag. Information extraction from HTML: Application of a general machine learning approach. In Proceedings of the 15th National Conference on Artificial Intelligence (AAAI-98). AAAI Press, 1998.

    Google Scholar 

  • J. Fürnkranz. A study using n-gram features for text categorization. Technical Report OEFAI-TR-98-30, Austrian Research Institute for Artificial Intelligence, Wien, Austria, 1998.

    Google Scholar 

  • J. Fürnkranz. Hyperlink ensembles: A case study in hypertext classification. Information Fusion, 3(4). 299–312, December 2002. Special Issue on Fusion of Multiple Classifiers.

    Article  Google Scholar 

  • J. Fürnkranz, C. Holzbaur, and R. Temel. User profiling for the Melvil knowledge retrieval system. Applied Artificial Intelligence, 16(4): 243–281, 2002.

    Article  Google Scholar 

  • J. Fürakranz, T. Mitchell, and E. Riloff. A case study in using linguistic phrases for text categorization on the WWW. In M. Sahami, editor, Learning for Text Categorization: Proceedings of the 1998 AAAI/ICML Workshop, pages 5–12, Madison, WI, 1998. AAAI Press. Technical Report WS-98-05.

    Google Scholar 

  • D. Goldberg, D. Nichols, B. M. Oki, and D. Terry. Using collaborative filtering to weave and information tapestry. Communications of the ACM, 35(12):61–70, December 1992.

    Article  Google Scholar 

  • G. Gottlob, C. Koch, R. Baumgartner, M. Herzog, and S. Flesca. The Lixto data extraction project — Back and forth between theory and practice. In Proceedings of the Symposium on Principles of Database Systems (PODS-04), 2004.

    Google Scholar 

  • P. Graham. Better bayesian filtering. In Proceedings of the 2003 Spam Conference, Cambridge, MA, 2003

    Google Scholar 

  • G. Grieser, K. P. Jantke, S. Lange, and B. Thomas. A unifying approach to HTML wrapper representation and learning. In S. Arikawa and S. Morishita, editors, Proc. 3rd International Conference on Discovery Science, pages 50–64. Springer-Verlag, 2000.

    Google Scholar 

  • T. Hofmann and J. Puzicha. Latent class models for collaborative filtering. In Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99), pages 688–693, 1999.

    Google Scholar 

  • C. N. Hsu and M. T. Dung. Generating finite-state transducers for semistructured data extraction from the web. Information Systems, 23(8):521–538, 1998. Special Issue on Semistructured Data.

    Article  Google Scholar 

  • T. Joachims. Optimizing search engines using clickthrough data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-02), pages 133–142. ACM Press, 2002.

    Google Scholar 

  • J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):6O4–632, September 1999. ISSN 0004-5411.

    Article  MathSciNet  Google Scholar 

  • J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. Grouplens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77–87, 1997. Special Issue on Recommender Systems.

    Article  Google Scholar 

  • R. Kosala and H. Blockeel. Web mining research: A survey. SIGKDD explorations, 2(1):1–15, 2000

    Google Scholar 

  • R. Kozierok and P. Maes. Learning interface agents. In Proceedings of the 11th National Conference on Artificial Intelligence (AAAI-93), pages 459–465. AAAI Press. 1993.

    Google Scholar 

  • N. Kushmerick. Wrapper induction: Efficiency and expressiveness. Artificial Intelligence, 118:15–68, 2000.

    Article  MATH  MathSciNet  Google Scholar 

  • K. Lang. NewsWeeder: Learning to filter netnews. In A. Prieditis and S. Russell, editors, Proceedings of the 12th International Conference on Machine Learning (ML-95), pages 331–339. Morgan Kaufmann, 1995.

    Google Scholar 

  • Y. Lashkari, M. Metral, and P. Maes. Collaborative interface agents. In Proceedings of the 12th National Conference on Artificial Intelligence (AAAI-94), pages 444–450, Seattle, WA, 1994. AAAI Press.

    Google Scholar 

  • S. Lawrence and C. L. Giles. Searching the world wide web. Science, 280:98–100, 1998.

    Article  Google Scholar 

  • K. Lerman, S. N. Minton, and C. A. Knoblock. Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research, 18:149–181, 2003.

    Google Scholar 

  • M. Levene, J. Borges, and G. Louizou. Zipf’s law for Web surfers. Knowledge and Information Systems, 3(1): 120–129, 2001.

    Article  Google Scholar 

  • D. D. Lewis. An evaluation of phrasal and clustered representations on a text categorization task. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Devlopment in Information Retrieval, pages 37–50, 1992.

    Google Scholar 

  • W. Lin, S. A. Alvarez, and C. Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6(1): 83–105, 2002.

    Article  MathSciNet  Google Scholar 

  • A. Maedche, C. Nédellec, S. Staab, and E. Hovy, editors. Proceedings of the 2nd Workshop on Ontology Learning (OL-2001), volume 38 of CEUR Workshop Proceedings, Seattle, WA, 2001. IJCAI-01.

    Google Scholar 

  • A. Maedche, V. Pekar, and S. Staab. Ontology learning part one — on discovering taxonomic relations from the web. In N. Zhong, J. Liu, and Y. Y. Yao, editors, Web Intelligence, pages 301–321. Springer-Verlag. 2003.

    Google Scholar 

  • A. Maedche and S. Staab. Learning ontologies for the semantic web. IEEE Intelligent Systems, 16(2), 2001.

    Google Scholar 

  • P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):30–40, July 1994. Special Issue on Intelligent Agents.

    Article  Google Scholar 

  • O. A. McBryan. GENVL and WWWW: Tools for taming the Web. In Proceedings of the 1st World-Wide Web Conference (WWW-1), pages 58–67, Geneva, Switzerland, 1994. Elsevier.

    Google Scholar 

  • A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In M. Sahami, editor, Learning for Text Categorization: Proceedings of the 1998 AAAI/ICML Workshop, pages 41–48, Madison, WI, 1998. AAAI Press.

    Google Scholar 

  • P. Melville, R. J. Mooney, and R. Nagarajan. Content-boosted collaborative filtering for improved recommendations. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI-2002), pages 187–192, Edmonton, Canada, 2002.

    Google Scholar 

  • D. Mladenić Personal WebWatcher: Implementation and design. Technical Report IJS-DP-7472, Department of Intelligent Systems, Jožef Stefan Institute, 1996.

    Google Scholar 

  • D. Mladenić. Feature subset selection in text-learning. In C. Nédellec and C. Rouveirol, editors, Proceedings of the 10th European Conference on Machine Learning (ECML-98), pages 95–100, Chemnitz, Germany, 1998a. Springer-Verlag.

    Google Scholar 

  • D. Mladenić. Turning Yahoo into an automatic web-page classifier. In H. Prade, editor, Proceedings of the 13th European Conference on Artificial Intelligence (ECAI-98), pages 473–474, Brighton, U.K., 1998b. Wiley.

    Google Scholar 

  • D. Mladenić. Text-learning and related intelligent agents: A survey. IEEE Intelligent Systems, 14(4):44–54, July/August 1999.

    Article  Google Scholar 

  • D. Mladenić and M. Grobelnik. Word sequences as features in text learning. In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK-98), Ljubljana, Slovenia, 1998. IEEE section.

    Google Scholar 

  • B. Mobasber, R. Cooley, and J. Srivastava. Automatic personalization based on web usage mining. Communications of the ACM, 43(8):142–151, 2000.

    Article  Google Scholar 

  • B. Mobasher, H. Dai, T. Luo, and M. Nakagawa. Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery, 6(1):61–82, 2002.

    Article  MathSciNet  Google Scholar 

  • K. J. Mock. Hybrid hill-climbing and knowledge-based methods for intelligent news filtering. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 48–53. AAAI Press, 1996.

    Google Scholar 

  • J. Myllymaki. Effective web data extraction with standard XML technologies (HTML). In Proceedings of the 10th International World Wide Web Conference (WWW-01), Hong Kong, May 2001.

    Google Scholar 

  • H. J. Oh, S. H. Myaeng, and M.-H. Lee. A practical hypertext categorization method using links and incrementally available class information. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR-00), pages 264–271, Athens, Greece, 2000.

    Google Scholar 

  • T. R. Payne and P. Edwards. Interface agents that learn: An investigation of learning issues in a mail agent interface. Applied Artificial Intelligence, 11(1): 1–32, 1997.

    Article  Google Scholar 

  • M. T. Pazienza, editor. Information Extraction in the Web Era: Natural Language Communication for Knowledge Acquisition and Intelligent Information Agents (SCIE-02), Rome. Italy, 2003. Springer-Verlag.

    Google Scholar 

  • M. Pazzani, J. Muramatsu, and D. Billsus. Syskill & Webert: Identifying interesting web sites. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 54–61. AAAI Press, 1996.

    Google Scholar 

  • M. Perkowitz and O. Etzioni. Towards adaptive web sites: Conceptual framework and case study. Artificial Intelligence, 118:245–275, 2000.

    Article  Google Scholar 

  • D. Pierrakos, G. Paliouras, C. Papatheodorou, and C. D. Spyropoulos. Web usage mining as a tool for personalization: A survey. User Modeling and User-Adapted Interaction, 13(4):311–372, 2003.

    Article  Google Scholar 

  • A. Popescul, L. Ungar, D. Pennock, and S. Lawrence. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. In Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence (UAI-2001), pages 437–444. Morgan Kaufmann, 2001.

    Google Scholar 

  • J. R. Quinlan. Learning logical definitions from relations. Machine Learning, 5:239–266, 1990.

    Google Scholar 

  • J. R. Quinlan. Determinate literals in inductive logic programming. In Proceedings of the 8th International Workshop on Machine Learning (ML-91), pages 442–446, 1991.

    Google Scholar 

  • P. Resnick and H. R. Varian. Special issue on recommender systems. Communications of the ACM, 40(3), 1997.

    Google Scholar 

  • B. L. Richards and R. J. Mooney. Learning relations by pathfinding. In Proceedings of the 10th National Conference on Artificial Intelligence (AAAI-92), pages 50–55, San Jose, CA, 1992. AAAI Press.

    Google Scholar 

  • E. Riloff. Automatically generating extraction patterns from untagged text. In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 1044–1049. AAAI Press, 1996a.

    Google Scholar 

  • E. Riloff. An empirical study of automated dictionary construction for information extraction in three domains. Artificial Intelligence, 85:101–134, 1996b.

    Article  Google Scholar 

  • G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

  • G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  • G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11):613–620, November 1975.

    Article  Google Scholar 

  • B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference (WWW-10), Hong Kong, May 2001.

    Google Scholar 

  • J. B. Schafer, J. A. Konstan, and J. Riedl. Electronic commerce recommender applications. Data Mining and Knowledge Discovery, 5(1/2): 115–152, 2000.

    Article  Google Scholar 

  • T. Scheffer. Email answering assistance by semi-supervised text classification. Intelligent Data Analysis, 8(5), 2004.

    Google Scholar 

  • S. Scott and S. Matwin. Feature engineering for text classification. In I. Bratko and S. Džeroski, editors, Proceedings of 16th International Conference on Machine Learning (ICML-99), pages 379–388, Bled, SL, 1999. Morgan Kaufmann Publishers, San Francisco, US.

    Google Scholar 

  • F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1):1–47, March 2002.

    Article  Google Scholar 

  • B. Sheth and P. Maes. Evolving agents for personalized information filtering. In Proceedings of the 9th Conference on Artificial Intelligence for Applications (CAIA-93), pages 345–352. IEEE Press, 1993.

    Google Scholar 

  • S. Slattery and T. Mitchell. Discovering test set regularities in relational domains. In P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning (ICML-00), pages 895–902, Stanford, CA, 2000. Morgan Kaufmann.

    Google Scholar 

  • S. Soderland. Learning information extraction rules for semi-structured and free text. Machine Learning, 34(1–3):233–272, 1999.

    Article  MATH  Google Scholar 

  • E. Spertus. ParaSite: Mining structural information on the Web. Computer Networks and ISDN Systems, 29(8–13): 1205–1215, September 1997. Proceedings of the 6th International World Wide Web Conference (WWW-6).

    Article  Google Scholar 

  • M. Spiliopoulou. The laborious way from Data Mining to web log mining. Journal of Computer Systems Science and Engineering, 14:113–126, 1999. Special Issue on Semantics of the Web.

    Google Scholar 

  • J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD explorations, 1(2):12–23, 2000.

    Google Scholar 

  • S. Staab and A. Maedche. Knowledge portals — ontologies at work. AI Magazine, 21(2):63–75, Summer 2001.

    Google Scholar 

  • S. Staab, A. Maedche, C. Nédellec, and P. Wiemer-Hastings, editors. Proceedings of the 1st Workshop on Ontology Learning (OL-2000), volume 31 of CEUR Workshop Pmceedings, Berlin, 2000. ECAI-00.

    Google Scholar 

  • S. Staab and R. Studer, editors. Handbook on Ontologies. International Handbooks on Information Systems. Springer-Verlag, 2004.

    Google Scholar 

  • G. Stumme, A. Hotho, and B. Berendt, editors. Proceedings of the ECML PKDD 2001 Workshop on Semantic Web Mining, Freiburg, Germany, 2001.

    Google Scholar 

  • G. Stumme, A. Hotho, and B. Berendt, editors. Proceedings of the ECML PKDD 2002 Workshop on Semantic Web Mining, Helsinki, Finland, 2002.

    Google Scholar 

  • P. N. Tan and V. Kumar. Discovery of web robot sessions based on their navigational patterns. Data Mining and Knowledge Discovery, 6(1):9–35, 2002.

    Article  MathSciNet  Google Scholar 

  • L. H. Ungar and D. P. Foster. Clustering methods for collaborative filtering. In H. Kautz, editor, Proceedings of the AAAI-98 Workshop on Recommender Systems, page 112, Madison, Wisconsin, 1998. AAAI Press. Technical Report WS-98-08.

    Google Scholar 

  • Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In D. Fisher, editor, Proceedings of the 14th International Conference on Machine Learning (ICML-97), pages 412–420, Nashville, TN, 1997. Morgan Kaufmann.

    Google Scholar 

  • Y. Yang, S. Slattery, and R. Ghani. A study of approaches to hypertext categorization. Journal of Intelligent Information Systems, 18(2–3):219–241, March 2002. Special Issue on Automatic Text Categorization.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer Science+Business Media, Inc.

About this chapter

Cite this chapter

Fürnkranz, J. (2005). Web Mining. In: Maimon, O., Rokach, L. (eds) Data Mining and Knowledge Discovery Handbook. Springer, Boston, MA. https://doi.org/10.1007/0-387-25465-X_42

Download citation

  • DOI: https://doi.org/10.1007/0-387-25465-X_42

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-387-24435-8

  • Online ISBN: 978-0-387-25465-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics