Skip to main content

A Roadmap for Web Mining: From Web to Semantic Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3209))

Abstract

The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in “e-activities” such as e-commerce, e-learning, e-government, e-science.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Michalski, R., Bratko, I. (eds.): M.K.: Machine Learning and Data Mining: methods and applications. John Wiley and Sons, Chichester (1998)

    Google Scholar 

  2. Paliouras, G., Karkaletsis, V. (eds.): C.S.: Machine Learning and its Applications. Springer, Heidelberg (2001)

    Google Scholar 

  3. Franke, J., Nakhaeizadeh, G., Renz, I. (eds.): Text Mining, Theoretical Aspects and Applications. Physica-Verlag, Heidelberg (2003)

    MATH  Google Scholar 

  4. Berners-Lee, T., Fischetti, M.: Weaving the Web. Harper, San Francisco (1999)

    Google Scholar 

  5. Berendt, B., Stumme, G., Hotho, A.: Usage mining for and on the semantic web. In: Data Mining: Next Generation Challenges and Future Directions, pp. 467–486. AAAI/MIT Press (2004)

    Google Scholar 

  6. Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: [73], 264–278 (2002)

    Google Scholar 

  7. Mladenić, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Journal of Decission support systems 35, 45–87 (2003)

    Article  Google Scholar 

  8. Erdmann, M.: Ontologien zur konzeptuellen Modellierung der Semantik von XML. Isbn: 3831126356, University of Karlsruhe (2001)

    Google Scholar 

  9. W3C: RDF/XML Syntax Specification (Revised). W3C recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/

  10. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  11. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  12. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)

    Google Scholar 

  13. Weiss, M., Indurkhya, N.: Pedictive Data-Mining: A Practical Guide. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  14. Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York (1994)

    MATH  Google Scholar 

  15. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD, Washington D.C., USA, pp. 207–216 (1993)

    Google Scholar 

  16. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf.Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)

    Google Scholar 

  17. Adamo, J.M.: Data Mining and Association Rules for Sequential Patterns: Sequential and Parallel Algorithms. Springer, New York (2001)

    Book  MATH  Google Scholar 

  18. Roddick, J., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Trans. of Knowledge and Data Engineering (2002)

    Google Scholar 

  19. Lan, B., Bressan, S., Ooi, B.: Making web servers pushier. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 108–122. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  20. Scheffer, T., Wrobel, S.: A sequential sampling algorithm for a general class of utility criteria. Knowledge Discovery and Data Mining, 330–334 (2000)

    Google Scholar 

  21. Zaki, M., Lesh, N., Ogihara, M.: Mining features for sequence classification. In: KDD 1999, pp. 342–346. ACM, New York (1999)

    Google Scholar 

  22. Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Agrawal, R., Stolorz, P., Piatesky-Shapiro, G. (eds.) Proc. of 4th Int. Conf. KDD, New York, NY, pp. 359–363 (1998)

    Google Scholar 

  23. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine source. In: Proceedings of the seventh international conference on World Wide Web, Elsevier Science Publishers, Amsterdam (1998)

    Google Scholar 

  24. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  25. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems 1, 5–32 (1999)

    Google Scholar 

  26. Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. Rashid, L., Tuzhilin, A. (eds.) INFORMS Journal on Computing, Special Issue on Mining Web-based Data for E-Business Applications (2003)

    Google Scholar 

  27. McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  28. Mladenic, D.: Web browsing using machine learning on text data. In: Szczepaniak, P. (ed.) Intelligent exploration of the web, vol. 111, pp. 288–303. Physica-Verlag, Heidelberg (2002)

    Google Scholar 

  29. Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 170–178 (1997)

    Google Scholar 

  30. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  31. Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Proceedings of the workshop on Machine Learning in the New Information Age (2000)

    Google Scholar 

  32. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Grobelnik, M., Mladenic, D., Milic-Frayling, N. (eds.) Proceedings of the KDD Workshop on Text Mining (2000)

    Google Scholar 

  33. Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval, pp. 46–54 (1998)

    Google Scholar 

  34. Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4, 177–210 (2003)

    Article  MathSciNet  Google Scholar 

  35. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings AAAI 2000, pp. 577–583 (2000)

    Google Scholar 

  36. Meng, X., Hu, D.: C.L.: Schema-guided wrapper maintenance for web-data extraction. In: ACM Fifth International Workshop on Web Information and Data Management, WIDM 2003 (2003)

    Google Scholar 

  37. Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective, pp. 79–103. Springer, Berlin (2004)

    Google Scholar 

  38. Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web page. In: Proc. of AAAI/IAAI 1998, pp. 727–732 (1998)

    Google Scholar 

  39. Lin, W., Alvarez, S., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery 6, 83–105 (2002)

    Article  MathSciNet  Google Scholar 

  40. Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)

    Article  MathSciNet  Google Scholar 

  41. Baumgarten, M., Büchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation pattern discovery from internet data. In: Proceedings, vol. [74], pp. 70–87 (2000)

    Google Scholar 

  42. Borges, J.L., Levene, M.: Data mining of user navigation patterns. In: Spiliopoulou, M., Masand, B. (eds.) Advances in Web Usage Analysis and User Profiling, pp. 92–111. Springer, Berlin (2000)

    Chapter  Google Scholar 

  43. Spiliopoulou, M.: The laborious way from data mining to web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on “Semantics of the Web” 14, 113–126 (1999)

    Google Scholar 

  44. Cutler, M.: E-metrics: Tomorrow’s business metrics today. In: KDD 2000, ACM Press, Boston (2000)

    Google Scholar 

  45. Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 57–66. ACM, New York (2001)

    Chapter  Google Scholar 

  46. Schwartz, M., Wood, D.: Discovering shared interests using graph analysis. Communications of the ACM 36, 78–89 (1993)

    Article  Google Scholar 

  47. Kautz, H., Selman, B., Shah, M.: Referralweb: Combining social networks and collaborative filtering. Communications of the ACM 40, 63–66 (1997)

    Article  Google Scholar 

  48. Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. Of the Mining for and from the Semantic Web Workshop at KDD 2004 (2004)

    Google Scholar 

  49. Zaiane, O., Simoff, S.: Mdm/kdd: Multimedia data mining for the second time. SIGKDD Explorations 3 (2003)

    Google Scholar 

  50. Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Procs. Of the SIGIR 2003 Semantic Web Workshop, Toronto, Canada (2003)

    Google Scholar 

  51. McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML 1998, Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  52. Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, pp. 217–228 (2003)

    Google Scholar 

  53. Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16, 72–79 (2001)

    Article  Google Scholar 

  54. Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)

    Google Scholar 

  55. Handschuh, S., Staab, S.: Authoring and annotation of web page in CREAM. In: Proc. Of WWW Conference (2002)

    Google Scholar 

  56. Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proceedings of ECML/PKDD, pp. 217–228. Springer, Heidelberg (2003)

    Google Scholar 

  57. Hovy, E.: Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Proc. 1st Intl. Conf. on Language Resources and Evaluation (LREC), Granada (1998)

    Google Scholar 

  58. Chalupsky, H.: Ontomorph: A translation system for symbolic knowledge. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh International Conference (KR 2000), pp. 471–482 (2000)

    Google Scholar 

  59. McGuinness, D., Fikes, R., Rice, J., Wilder, S.: An environment for merging and testing large ontologies. In: the Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado, USA, pp. 483–493 (2000)

    Google Scholar 

  60. Noy, N., Musen, M.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, Texas, pp. 450–455 (2000)

    Google Scholar 

  61. Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI 2001), pp. 225–230 (2001)

    Google Scholar 

  62. Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: A machine learning approach. In: Handbook on Ontologies, pp. 385–404. Springer, Berlin (2004)

    Google Scholar 

  63. Heß, A., Kushmerick, N.: Machine learning forannotating semantic web services. In: Proceedings of the First International Semantic Web Services Symposium. AAAI Spring Symposium Series, vol. 2 (2004)

    Google Scholar 

  64. Aguado, B., Merceron, A., Voisard, A.: Extracting information from structured exercises. In: Proceedings of the 4th International Conference on Information Technology Based Higher Education and Training ITHET 2003, Marrakech, Morocco (2003)

    Google Scholar 

  65. Tane, J., Schmitz, C., Stumme, G.: Semantic resource management for the web: An elearning application. In: Proc. 13th International World WideWeb Conference, WWW 2004 (2004)

    Google Scholar 

  66. Althoff, K., Becker-Kornstaedt, U., Decker, B., Klotz, A., Leopold, E., Rech, J., Voss, A.: The indigo project: Enhancement of experience management and process learning with moderated discourses. In: Perner, P. (ed.) Data Mining in Marketing and Medicine, Springer, Berlin (2002)

    Google Scholar 

  67. Yihune, G.: Evaluation eines medizinischen Informationssystems im World Wide Web. Nutzungsanalyse am Beispiel. PhD thesis, Ruprecht-Karls-Universität Heidelberg (2003), http://www.dermis.net

  68. Kralisch, A., Berendt, B.: Cultural determinants of search behaviour on websites. In: Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation (2004)

    Google Scholar 

  69. Heino, J., Toivonen, H.: Automated detection of epidemics from the usage logs of a physicians’ reference database. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 180–191. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  70. Mladenic, D., Lavrac, N., Bohanec, M., Moyle, S. (eds.): Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers, Dordrecht (2003)

    MATH  Google Scholar 

  71. Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: [75], pp. 217–228 (2002)

    Google Scholar 

  72. Iyengar, V.: Transforming data to satisfy privacy constraints. In: [75], pp. 279–288 (2002)

    Google Scholar 

  73. Horrocks, I., Hendler, J.A. (eds.): The Semantic Web. In: Horrocks, I., Hendler, J.A., (eds.): Proceedings of the First International Semantic Web Conference, Springer, Heidelberg (2002)

    Google Scholar 

  74. Masand, B., Spiliopoulou, M. (eds.): WebKDD 1999. LNCS (LNAI), vol. 1836. Springer, Heidelberg (2000)

    Google Scholar 

  75. Hand, D., Keim, D., Ng, R. (eds.): KDD - 2002 – Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berendt, B., Hotho, A., Mladenic, D., van Someren, M., Spiliopoulou, M., Stumme, G. (2004). A Roadmap for Web Mining: From Web to Semantic Web. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds) Web Mining: From Web to Semantic Web. EWMF 2003. Lecture Notes in Computer Science(), vol 3209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30123-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30123-3_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23258-2

  • Online ISBN: 978-3-540-30123-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics