LOGML: Log Markup Language for Web Usage Mining

  • John R. Punin
  • Mukkai S. Krishnamoorthy
  • Mohammed J. Zaki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2356)


Web Usage Mining refers to the discovery of interesting information from user navigational behavior as stored in web access logs. While extracting simple information from web logs is easy, mining complex structural information is very challenging. Data cleaning and preparation constitute a very significant effort before mining can even be applied. We propose two new XML applications, XGMML and LOGML to help us in this task. XGMML is a graph description language and LOGML is a web-log report description language. We generate a web graph in XGMML format for a web site using the web robot of the WWWPal system. We generate web-log reports in LOGML format for a web site from web log files and the web graph. We further illustrate the usefulness of LOGML in web usage mining; we show the simplicity with which mining algorithms (for extracting increasingly complex frequent patterns) can be specified and implemented efficiently using LOGML.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Punin, J., Krishnamoorthy, M.: WWWPal System-A System for Analysis and Synthesis of Web Pages. In: Proceedings of the WebNet 98 Conference, Orlando (1998)Google Scholar
  2. 2.
    R. Cooley, B. Mobasher, and J. Srivastava: Web Mining: Information and Pattern Discovery on the World Wide Web. In: 8th IEEE Intl. Conf. on Tools with AI. (1997)Google Scholar
  3. 3.
    Chen, M., Park, J., Yu, P.: Data mining for path traversal patterns in a web environment. In: International Conference on Distributed Computing Systems. (1996)Google Scholar
  4. 4.
    Spiliopoulou, M., Faulstich, L.: WUM: A Tool for Web Utilization Analysis. In: EDBT Workshop WebDB’98, LNCS 1590, Springer Verlag (1998)Google Scholar
  5. 5.
    Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing pattern. Knowledge and Information Systems 1 (1999)Google Scholar
  6. 6.
    Masand, B., Spiliopoulou, M., eds.: Advances in Web Usage Mining and User Profiling: Proceedings of the WEBKDD’99 Workshop. Number 1836 in LNAI. Springer Verlag (2000)Google Scholar
  7. 7.
    Kosala, R., Blockeel, H.: Web mining research: A survey. SIGKDD Explorations 2 (2000)Google Scholar
  8. 8.
    Thüring, M., Hannemann, J., Haake, J.: Hypermedia and cognition: Designing for comprehension. Communications of the ACM 38 (1995) 57–66CrossRefGoogle Scholar
  9. 9.
    Pirolli, P., Pitkow, J., Rao, R.: Silk from a Sow’s Ear: Extracting Usable Structure from the Web. In Tauber, M.J., Bellotti, V., Jeffries, R., Mackinlay, J.D., Nielsen, J., eds.: Proceedings of the Conference on Human Factors in Computing Systems: Commun Ground, New York, ACM Press (1996) 118–125CrossRefGoogle Scholar
  10. 10.
    Wu, K., Yu, P., Ballman, A.: Speed Tracer: A Web usage mining and analysis tool. Internet Computing 37 (1997) 89Google Scholar
  11. 11.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I.: Fast discovery of association rules. In Fayyad, U., et al, eds.: Advances in Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, CA (1996) 307–328Google Scholar
  12. 12.
    Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3):372–390 (2000)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: 5th Intl. Conf. Extending Database Technology. (1996)Google Scholar
  14. 14.
    Zaki, M.J.: SPADE: An efficient algorithm for mining frequent sequences. Machine Learning Journal 42 (2001) 31–60zbMATHCrossRefGoogle Scholar
  15. 15.
    Zaki, M.J., Hsiao, C.J.: ChARM: An efficient algorithm for closed itemset mining. In: 2nd SIAM International Conference on Data Mining. (2002)Google Scholar
  16. 16.
    Zaki, M.J.: Efficiently mining trees in a forest. Technical Report 01-7, Computer Science Dept., Rensselaer Polytechnic Institute (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • John R. Punin
    • 1
  • Mukkai S. Krishnamoorthy
    • 1
  • Mohammed J. Zaki
    • 1
  1. 1.Computer Science DepartmentRensselaer Polytechnic InstituteTroy

Personalised recommendations