A Roadmap for Web Mining: From Web to Semantic Web

Berendt, Bettina; Hotho, Andreas; Mladenic, Dunja; van Someren, Maarten; Spiliopoulou, Myra; Stumme, Gerd

doi:10.1007/978-3-540-30123-3_1

A Roadmap for Web Mining: From Web to Semantic Web

Bettina Berendt²⁴,
Andreas Hotho²⁵,
Dunja Mladenic²⁶,
Maarten van Someren²⁷,
Myra Spiliopoulou²⁸ &
…
Gerd Stumme²⁵

Conference paper

590 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3209))

Abstract

The purpose of Web mining is to develop methods and systems for discovering models of objects and processes on the World Wide Web and for web-based systems that show adaptive performance. Web Mining integrates three parent areas: Data Mining (we use this term here also for the closely related areas of Machine Learning and Knowledge Discovery), Internet technology and World Wide Web, and for the more recent Semantic Web. The World Wide Web has made an enormous amount of information electronically accessible. The use of email, news and markup languages like HTML allow users to publish and read documents at a world-wide scale and to communicate via chat connections, including information in the form of images and voice records. The HTTP protocol that enables access to documents over the network via Web browsers created an immense improvement in communication and access to information. For some years these possibilities were used mostly in the scientific world but recent years have seen an immense growth in popularity, supported by the wide availability of computers and broadband communication. The use of the internet for other tasks than finding information and direct communication is increasing, as can be seen from the interest in “e-activities” such as e-commerce, e-learning, e-government, e-science.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Michalski, R., Bratko, I. (eds.): M.K.: Machine Learning and Data Mining: methods and applications. John Wiley and Sons, Chichester (1998)
Google Scholar
Paliouras, G., Karkaletsis, V. (eds.): C.S.: Machine Learning and its Applications. Springer, Heidelberg (2001)
Google Scholar
Franke, J., Nakhaeizadeh, G., Renz, I. (eds.): Text Mining, Theoretical Aspects and Applications. Physica-Verlag, Heidelberg (2003)
MATH Google Scholar
Berners-Lee, T., Fischetti, M.: Weaving the Web. Harper, San Francisco (1999)
Google Scholar
Berendt, B., Stumme, G., Hotho, A.: Usage mining for and on the semantic web. In: Data Mining: Next Generation Challenges and Future Directions, pp. 467–486. AAAI/MIT Press (2004)
Google Scholar
Berendt, B., Hotho, A., Stumme, G.: Towards semantic web mining. In: [73], 264–278 (2002)
Google Scholar
Mladenić, D., Grobelnik, M.: Feature selection on hierarchy of web documents. Journal of Decission support systems 35, 45–87 (2003)
Article Google Scholar
Erdmann, M.: Ontologien zur konzeptuellen Modellierung der Semantik von XML. Isbn: 3831126356, University of Karlsruhe (2001)
Google Scholar
W3C: RDF/XML Syntax Specification (Revised). W3C recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Google Scholar
Weiss, M., Indurkhya, N.: Pedictive Data-Mining: A Practical Guide. Morgan Kaufmann, San Francisco (1997)
Google Scholar
Lavrac, N., Dzeroski, S.: Inductive Logic Programming: Techniques and Applications. Ellis Horwood, New York (1994)
MATH Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: SIGMOD, Washington D.C., USA, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Bocca, J.B., Jarke, M., Zaniolo, C. (eds.) Proc. 20th Int. Conf.Very Large Data Bases, VLDB, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Google Scholar
Adamo, J.M.: Data Mining and Association Rules for Sequential Patterns: Sequential and Parallel Algorithms. Springer, New York (2001)
Book MATH Google Scholar
Roddick, J., Spiliopoulou, M.: A survey of temporal knowledge discovery paradigms and methods. IEEE Trans. of Knowledge and Data Engineering (2002)
Google Scholar
Lan, B., Bressan, S., Ooi, B.: Making web servers pushier. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 108–122. Springer, Heidelberg (2000)
Chapter Google Scholar
Scheffer, T., Wrobel, S.: A sequential sampling algorithm for a general class of utility criteria. Knowledge Discovery and Data Mining, 330–334 (2000)
Google Scholar
Zaki, M., Lesh, N., Ogihara, M.: Mining features for sequence classification. In: KDD 1999, pp. 342–346. ACM, New York (1999)
Google Scholar
Weiss, G.M., Hirsh, H.: Learning to predict rare events in event sequences. In: Agrawal, R., Stolorz, P., Piatesky-Shapiro, G. (eds.) Proc. of 4th Int. Conf. KDD, New York, NY, pp. 359–363 (1998)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine source. In: Proceedings of the seventh international conference on World Wide Web, Elsevier Science Publishers, Amsterdam (1998)
Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46, 604–632 (1999)
Article MATH MathSciNet Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Journal of Knowledge and Information Systems 1, 5–32 (1999)
Google Scholar
Spiliopoulou, M., Mobasher, B., Berendt, B., Nakagawa, M.: A framework for the evaluation of session reconstruction heuristics in web usage analysis. Rashid, L., Tuzhilin, A. (eds.) INFORMS Journal on Computing, Special Issue on Mining Web-based Data for E-Business Applications (2003)
Google Scholar
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Morgan Kaufmann, San Francisco (1998)
Google Scholar
Mladenic, D.: Web browsing using machine learning on text data. In: Szczepaniak, P. (ed.) Intelligent exploration of the web, vol. 111, pp. 288–303. Physica-Verlag, Heidelberg (2002)
Google Scholar
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning ICML 1997, pp. 170–178 (1997)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)
Article MathSciNet Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.: An evaluation of naive bayesian anti-spam filtering. In: Potamias, G., Moustakis, V., van Someren, M. (eds.) Proceedings of the workshop on Machine Learning in the New Information Age (2000)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: Grobelnik, M., Mladenic, D., Milic-Frayling, N. (eds.) Proceedings of the KDD Workshop on Text Mining (2000)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: A feasibility demonstration. In: Research and Development in Information Retrieval, pp. 46–54 (1998)
Google Scholar
Califf, M.E., Mooney, R.J.: Bottom-up relational learning of pattern matching rules for information extraction. Journal of Machine Learning Research 4, 177–210 (2003)
Article MathSciNet Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings AAAI 2000, pp. 577–583 (2000)
Google Scholar
Meng, X., Hu, D.: C.L.: Schema-guided wrapper maintenance for web-data extraction. In: ACM Fifth International Workshop on Web Information and Data Management, WIDM 2003 (2003)
Google Scholar
Kushmerick, N., Thomas, B.: Adaptive information extraction: Core technologies for information agents. In: Intelligent Information Agents R&D in Europe: An Agent Link perspective, pp. 79–103. Springer, Berlin (2004)
Google Scholar
Perkowitz, M., Etzioni, O.: Adaptive web sites: Automatically synthesizing web page. In: Proc. of AAAI/IAAI 1998, pp. 727–732 (1998)
Google Scholar
Lin, W., Alvarez, S., Ruiz, C.: Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery 6, 83–105 (2002)
Article MathSciNet Google Scholar
Mobasher, B., Dai, H., Luo, T., Nakagawa, M.: Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining and Knowledge Discovery 6, 61–82 (2002)
Article MathSciNet Google Scholar
Baumgarten, M., Büchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: Navigation pattern discovery from internet data. In: Proceedings, vol. [74], pp. 70–87 (2000)
Google Scholar
Borges, J.L., Levene, M.: Data mining of user navigation patterns. In: Spiliopoulou, M., Masand, B. (eds.) Advances in Web Usage Analysis and User Profiling, pp. 92–111. Springer, Berlin (2000)
Chapter Google Scholar
Spiliopoulou, M.: The laborious way from data mining to web mining. Int. Journal of Comp. Sys., Sci. & Eng., Special Issue on “Semantics of the Web” 14, 113–126 (1999)
Google Scholar
Cutler, M.: E-metrics: Tomorrow’s business metrics today. In: KDD 2000, ACM Press, Boston (2000)
Google Scholar
Domingos, P., Richardson, M.: Mining the network value of customers. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 57–66. ACM, New York (2001)
Chapter Google Scholar
Schwartz, M., Wood, D.: Discovering shared interests using graph analysis. Communications of the ACM 36, 78–89 (1993)
Article Google Scholar
Kautz, H., Selman, B., Shah, M.: Referralweb: Combining social networks and collaborative filtering. Communications of the ACM 40, 63–66 (1997)
Article Google Scholar
Bloehdorn, S., Hotho, A.: Boosting for text classification with semantic features. In: Proc. Of the Mining for and from the Semantic Web Workshop at KDD 2004 (2004)
Google Scholar
Zaiane, O., Simoff, S.: Mdm/kdd: Multimedia data mining for the second time. SIGKDD Explorations 3 (2003)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Procs. Of the SIGIR 2003 Semantic Web Workshop, Toronto, Canada (2003)
Google Scholar
McCallum, A., Rosenfeld, R., Mitchell, T., Ng, A.: Improving text classification by shrinkage in a hierarchy of classes. In: Proceedings of the 15th International Conference on Machine Learning ICML 1998, Morgan Kaufmann, San Francisco (1998)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proc. of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD, pp. 217–228 (2003)
Google Scholar
Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16, 72–79 (2001)
Article Google Scholar
Cimiano, P., Hotho, A., Staab, S.: Comparing conceptual, partitional and agglomerative clustering for learning taxonomies from text. In: Proceedings of the European Conference on Artificial Intelligence, ECAI 2004 (2004)
Google Scholar
Handschuh, S., Staab, S.: Authoring and annotation of web page in CREAM. In: Proc. Of WWW Conference (2002)
Google Scholar
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Proceedings of ECML/PKDD, pp. 217–228. Springer, Heidelberg (2003)
Google Scholar
Hovy, E.: Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Proc. 1st Intl. Conf. on Language Resources and Evaluation (LREC), Granada (1998)
Google Scholar
Chalupsky, H.: Ontomorph: A translation system for symbolic knowledge. In: Principles of Knowledge Representation and Reasoning: Proceedings of the Seventh International Conference (KR 2000), pp. 471–482 (2000)
Google Scholar
McGuinness, D., Fikes, R., Rice, J., Wilder, S.: An environment for merging and testing large ontologies. In: the Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado, USA, pp. 483–493 (2000)
Google Scholar
Noy, N., Musen, M.: Prompt: Algorithm and tool for automated ontology merging and alignment. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI 2000), Austin, Texas, pp. 450–455 (2000)
Google Scholar
Stumme, G., Maedche, A.: Fca-merge: Bottom-up merging of ontologies. In: Proceedings 17th International Conference on Artificial Intelligence (IJCAI 2001), pp. 225–230 (2001)
Google Scholar
Doan, A., Madhavan, J., Domingos, P., Halevy, A.: Ontology matching: A machine learning approach. In: Handbook on Ontologies, pp. 385–404. Springer, Berlin (2004)
Google Scholar
Heß, A., Kushmerick, N.: Machine learning forannotating semantic web services. In: Proceedings of the First International Semantic Web Services Symposium. AAAI Spring Symposium Series, vol. 2 (2004)
Google Scholar
Aguado, B., Merceron, A., Voisard, A.: Extracting information from structured exercises. In: Proceedings of the 4th International Conference on Information Technology Based Higher Education and Training ITHET 2003, Marrakech, Morocco (2003)
Google Scholar
Tane, J., Schmitz, C., Stumme, G.: Semantic resource management for the web: An elearning application. In: Proc. 13th International World WideWeb Conference, WWW 2004 (2004)
Google Scholar
Althoff, K., Becker-Kornstaedt, U., Decker, B., Klotz, A., Leopold, E., Rech, J., Voss, A.: The indigo project: Enhancement of experience management and process learning with moderated discourses. In: Perner, P. (ed.) Data Mining in Marketing and Medicine, Springer, Berlin (2002)
Google Scholar
Yihune, G.: Evaluation eines medizinischen Informationssystems im World Wide Web. Nutzungsanalyse am Beispiel. PhD thesis, Ruprecht-Karls-Universität Heidelberg (2003), http://www.dermis.net
Kralisch, A., Berendt, B.: Cultural determinants of search behaviour on websites. In: Proceedings of the IWIPS 2004 Conference on Culture, Trust, and Design Innovation (2004)
Google Scholar
Heino, J., Toivonen, H.: Automated detection of epidemics from the usage logs of a physicians’ reference database. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 180–191. Springer, Heidelberg (2003)
Chapter Google Scholar
Mladenic, D., Lavrac, N., Bohanec, M., Moyle, S. (eds.): Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers, Dordrecht (2003)
MATH Google Scholar
Evfimievski, A., Srikant, R., Agrawal, R., Gehrke, J.: Privacy preserving mining of association rules. In: [75], pp. 217–228 (2002)
Google Scholar
Iyengar, V.: Transforming data to satisfy privacy constraints. In: [75], pp. 279–288 (2002)
Google Scholar
Horrocks, I., Hendler, J.A. (eds.): The Semantic Web. In: Horrocks, I., Hendler, J.A., (eds.): Proceedings of the First International Semantic Web Conference, Springer, Heidelberg (2002)
Google Scholar
Masand, B., Spiliopoulou, M. (eds.): WebKDD 1999. LNCS (LNAI), vol. 1836. Springer, Heidelberg (2000)
Google Scholar
Hand, D., Keim, D., Ng, R. (eds.): KDD - 2002 – Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Systems, Humboldt University Berlin, Germany
Bettina Berendt
Chair of Knowledge & Data Engineering, University of Kassel, Germany
Andreas Hotho & Gerd Stumme
Jozef Stefan Institute, Ljubljana, Slovenia
Dunja Mladenic
Social Science Informatics, University of Amsterdam, The Netherlands
Maarten van Someren
Institute of Technical and Business Information Systems, Otto–von–Guericke–University Magdeburg, Germany
Myra Spiliopoulou

Authors

Bettina Berendt
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Hotho
View author publications
You can also search for this author in PubMed Google Scholar
Dunja Mladenic
View author publications
You can also search for this author in PubMed Google Scholar
Maarten van Someren
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar
Gerd Stumme
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, K.U. Leuven, B-3001, Heverlee, Belgium
Bettina Berendt
Knowledge & Data Engineering Group, University of Kassel, Wilhelmshöher Allee 73, D-34121, Kassel, Germany
Andreas Hotho
Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Dunja Mladenič
Human Computer Studies Lab, University of Amsterdam, Kruislaan 419, 1089 VA, Amsterdam, The Netherlands
Maarten van Someren
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou
Research Center L3S, Appelstr. 9a, D-30167, Hannover, Germany
Gerd Stumme

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Berendt, B., Hotho, A., Mladenic, D., van Someren, M., Spiliopoulou, M., Stumme, G. (2004). A Roadmap for Web Mining: From Web to Semantic Web. In: Berendt, B., Hotho, A., Mladenič, D., van Someren, M., Spiliopoulou, M., Stumme, G. (eds) Web Mining: From Web to Semantic Web. EWMF 2003. Lecture Notes in Computer Science(), vol 3209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30123-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-540-30123-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23258-2
Online ISBN: 978-3-540-30123-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics