Abstract
The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in “e-activities” such as e-commerce, e-learning, e-government, e-science.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Michalski, R., Bratko, I. (eds.): M.K.: Machine Learning and Data Mining: methods and applications. John Wiley and Sons, Chichester (1998)
Paliouras, G., Karkaletsis, V. (eds.): C.S.: Machine Learning and its Applications. Springer, Heidelberg (2001)
Franke, J., Nakhaeizadeh, G., Renz, I. (eds.): Text Mining, Theoretical Aspects and Applications. Physica-Verlag, Heidelberg (2003)
Berners-Lee, T., Fischetti, M.: Weaving the Web. Harper, San Francisco (1999)
Berendt, B., Stumme, G., Hotho, A.: Usage mining for and on the semantic web. In: Data Mining: Next Generation Challenges and Future Directions, pp. 467–486. AAAI/MIT Press (2004)
Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: [73], 264–278 (2002)
Mladenić, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Journal of Decission support systems 35, 45–87 (2003)
Erdmann, M.: Ontologien zur konzeptuellen Modellierung der Semantik von XML. Isbn: 3831126356, University of Karlsruhe (2001)
W3C: RDF/XML Syntax Specification (Revised). W3C recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Weiss, M., Indurkhya, N.: Pedictive Data-Mining: A Practical Guide. Morgan Kaufmann, San Francisco (1997)
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York (1994)
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD, Washington D.C., USA, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf.Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Adamo, J.M.: Data Mining and Association Rules for Sequential Patterns: Sequential and Parallel Algorithms. Springer, New York (2001)
Roddick, J., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Trans. of Knowledge and Data Engineering (2002)
Lan, B., Bressan, S., Ooi, B.: Making web servers pushier. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 108–122. Springer, Heidelberg (2000)
Scheffer, T., Wrobel, S.: A sequential sampling algorithm for a general class of utility criteria. Knowledge Discovery and Data Mining, 330–334 (2000)
Zaki, M., Lesh, N., Ogihara, M.: Mining features for sequence classification. In: KDD 1999, pp. 342–346. ACM, New York (1999)
Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Agrawal, R., Stolorz, P., Piatesky-Shapiro, G. (eds.) Proc. of 4th Int. Conf. KDD, New York, NY, pp. 359–363 (1998)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine source. In: Proceedings of the seventh international conference on World Wide Web, Elsevier Science Publishers, Amsterdam (1998)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems 1, 5–32 (1999)
Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. Rashid, L., Tuzhilin, A. (eds.) INFORMS Journal on Computing, Special Issue on Mining Web-based Data for E-Business Applications (2003)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Morgan Kaufmann, San Francisco (1998)
Mladenic, D.: Web browsing using machine learning on text data. In: Szczepaniak, P. (ed.) Intelligent exploration of the web, vol. 111, pp. 288–303. Physica-Verlag, Heidelberg (2002)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 170–178 (1997)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Proceedings of the workshop on Machine Learning in the New Information Age (2000)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Grobelnik, M., Mladenic, D., Milic-Frayling, N. (eds.) Proceedings of the KDD Workshop on Text Mining (2000)
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval, pp. 46–54 (1998)
Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4, 177–210 (2003)
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings AAAI 2000, pp. 577–583 (2000)
Meng, X., Hu, D.: C.L.: Schema-guided wrapper maintenance for web-data extraction. In: ACM Fifth International Workshop on Web Information and Data Management, WIDM 2003 (2003)
Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective, pp. 79–103. Springer, Berlin (2004)
Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web page. In: Proc. of AAAI/IAAI 1998, pp. 727–732 (1998)
Lin, W., Alvarez, S., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery 6, 83–105 (2002)
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)
Baumgarten, M., Büchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation pattern discovery from internet data. In: Proceedings, vol. [74], pp. 70–87 (2000)
Borges, J.L., Levene, M.: Data mining of user navigation patterns. In: Spiliopoulou, M., Masand, B. (eds.) Advances in Web Usage Analysis and User Profiling, pp. 92–111. Springer, Berlin (2000)
Spiliopoulou, M.: The laborious way from data mining to web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on “Semantics of the Web” 14, 113–126 (1999)
Cutler, M.: E-metrics: Tomorrow’s business metrics today. In: KDD 2000, ACM Press, Boston (2000)
Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 57–66. ACM, New York (2001)
Schwartz, M., Wood, D.: Discovering shared interests using graph analysis. Communications of the ACM 36, 78–89 (1993)
Kautz, H., Selman, B., Shah, M.: Referralweb: Combining social networks and collaborative filtering. Communications of the ACM 40, 63–66 (1997)
Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. Of the Mining for and from the Semantic Web Workshop at KDD 2004 (2004)
Zaiane, O., Simoff, S.: Mdm/kdd: Multimedia data mining for the second time. SIGKDD Explorations 3 (2003)
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Procs. Of the SIGIR 2003 Semantic Web Workshop, Toronto, Canada (2003)
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML 1998, Morgan Kaufmann, San Francisco (1998)
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, pp. 217–228 (2003)
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16, 72–79 (2001)
Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)
Handschuh, S., Staab, S.: Authoring and annotation of web page in CREAM. In: Proc. Of WWW Conference (2002)
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proceedings of ECML/PKDD, pp. 217–228. Springer, Heidelberg (2003)
Hovy, E.: Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Proc. 1st Intl. Conf. on Language Resources and Evaluation (LREC), Granada (1998)
Chalupsky, H.: Ontomorph: A translation system for symbolic knowledge. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh International Conference (KR 2000), pp. 471–482 (2000)
McGuinness, D., Fikes, R., Rice, J., Wilder, S.: An environment for merging and testing large ontologies. In: the Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado, USA, pp. 483–493 (2000)
Noy, N., Musen, M.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, Texas, pp. 450–455 (2000)
Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI 2001), pp. 225–230 (2001)
Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: A machine learning approach. In: Handbook on Ontologies, pp. 385–404. Springer, Berlin (2004)
Heß, A., Kushmerick, N.: Machine learning forannotating semantic web services. In: Proceedings of the First International Semantic Web Services Symposium. AAAI Spring Symposium Series, vol. 2 (2004)
Aguado, B., Merceron, A., Voisard, A.: Extracting information from structured exercises. In: Proceedings of the 4th International Conference on Information Technology Based Higher Education and Training ITHET 2003, Marrakech, Morocco (2003)
Tane, J., Schmitz, C., Stumme, G.: Semantic resource management for the web: An elearning application. In: Proc. 13th International World WideWeb Conference, WWW 2004 (2004)
Althoff, K., Becker-Kornstaedt, U., Decker, B., Klotz, A., Leopold, E., Rech, J., Voss, A.: The indigo project: Enhancement of experience management and process learning with moderated discourses. In: Perner, P. (ed.) Data Mining in Marketing and Medicine, Springer, Berlin (2002)
Yihune, G.: Evaluation eines medizinischen Informationssystems im World Wide Web. Nutzungsanalyse am Beispiel. PhD thesis, Ruprecht-Karls-Universität Heidelberg (2003), http://www.dermis.net
Kralisch, A., Berendt, B.: Cultural determinants of search behaviour on websites. In: Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation (2004)
Heino, J., Toivonen, H.: Automated detection of epidemics from the usage logs of a physicians’ reference database. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 180–191. Springer, Heidelberg (2003)
Mladenic, D., Lavrac, N., Bohanec, M., Moyle, S. (eds.): Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers, Dordrecht (2003)
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: [75], pp. 217–228 (2002)
Iyengar, V.: Transforming data to satisfy privacy constraints. In: [75], pp. 279–288 (2002)
Horrocks, I., Hendler, J.A. (eds.): The Semantic Web. In: Horrocks, I., Hendler, J.A., (eds.): Proceedings of the First International Semantic Web Conference, Springer, Heidelberg (2002)
Masand, B., Spiliopoulou, M. (eds.): WebKDD 1999. LNCS (LNAI), vol. 1836. Springer, Heidelberg (2000)
Hand, D., Keim, D., Ng, R. (eds.): KDD - 2002 – Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Berendt, B., Hotho, A., Mladenic, D., van Someren, M., Spiliopoulou, M., Stumme, G. (2004). A Roadmap for Web Mining: From Web to Semantic Web. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds) Web Mining: From Web to Semantic Web. EWMF 2003. Lecture Notes in Computer Science(), vol 3209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30123-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-30123-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23258-2
Online ISBN: 978-3-540-30123-3
eBook Packages: Springer Book Archive