Semantic Analysis of Web Site Audience by Integrating Web Usage Mining and Web Content Mining

  • Jean-Pierre Norguet
  • Esteban Zimányi
  • Ralf Steinberger
Part of the Studies in Computational Intelligence book series (SCI, volume 172)


With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors’ expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose a method based on output page mining. Output page mining is a new kind of Web usage mining, between Web usage mining and Web content mining. In our method, we first collect the Web pages output by the Web server. Then, for a given taxonomy covering the Web site knwoledge domain, we aggregate the term weights in the output pages using OLAP tools, in order to obtain topic-based metrics representing the audience of the Web site topics. To demonstrate how our approach solves the cited problems, we compute topic-based metrics with SQL Server OLAP Analysis Service and our prototype WASA for real Web sites. Finally, we compare our results against those obtained with Google Analytics, a popular Web analytics tool.


World Wide Web Web analytics Semantic Web Web usage mining Data Mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Srivastava, J., Cooley, R., Deshpande, M., Pang-Ning, T.: Web usage mining: Discovery and applications of usage patterns from web data, SIGKDD Explorations 1(2)Google Scholar
  2. 2.
    March, J., Simon, H., Guetzkow, H.: Organizations, 2nd edn. Blackwell, Cambridge (1983)Google Scholar
  3. 3.
    Wahli, U., Norguet, J., Andersen, J., Hargrove, N., Meser, M.: Websphere Version 5 Application Development Handbook. IBM Press (2003),
  4. 4.
    Chen, M.-S., Han, J., Yu, P.S.: Data mining: An overview from a database perspective. IEEE Trans. Knowl. Data Eng. 8(6), 866–883 (1996)CrossRefGoogle Scholar
  5. 5.
    Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8), 142–151 (2000)CrossRefGoogle Scholar
  6. 6.
    Aggarwal, C.C., Yu, P.S.: On disk caching of web objects in proxy servers. In: Proc. of the 6th Int. Conf. on Information and Knowledge Management, CIKM, pp. 238–245 (1997)Google Scholar
  7. 7.
    Perkowitz, M., Etzioni, O.: Towards adaptive web sites: Conceptual framework and case study. J. of Artif. Intell. 118(1-2), 245–275 (2000)zbMATHCrossRefGoogle Scholar
  8. 8.
    Büchner, A.G., Mulvenna, M.D.: Discovering internet marketing intelligence through online analytical web usage mining. SIGMOD Record 27(4), 54–61 (1998)CrossRefGoogle Scholar
  9. 9.
    Pirolli, P., Pitkow, J.E.: Distributions of surfers’ paths through the world wide web: Empirical characterizations. J. of the World Wide Web 2(1-2), 29–45 (1999)CrossRefGoogle Scholar
  10. 10.
    Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T.: Using SOFM to improve web site text content. In: Proc. of the 1st Int. Conf. on Advances in Natural Computation, ICNC, Part II, pp. 622–626 (2005)Google Scholar
  11. 11.
    Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.E.: Using information scent to model user information needs and actions and the web. In: Proc. of the SIGCHI on Human Factors in Computing Systems, pp. 490–497 (2001)Google Scholar
  12. 12.
    Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)CrossRefGoogle Scholar
  13. 13.
    Materna, G.: Extraction par déformattage du contenu de pages Web dynamiques semi-structurées, travail de fin d’études d’Ingénieur civil informaticien, Faculté des Sciences Appliquées, Université Libre de Bruxelles (2002)Google Scholar
  14. 14.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  15. 15.
    Stumme, G., Maedche, A.: FCA-MERGE: Bottom-up merging of ontologies. In: Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, IJCAI, pp. 225–234 (2001)Google Scholar
  16. 16.
    Sweiger, M., Madsen, M., Langston, J., Lombard, H.: Clickstream Data Warehousing. John Wiley & Sons, Chichester (2002)Google Scholar
  17. 17.
    Malinowski, E., Zimányi, E.: OLAP hierarchies: A conceptual perspective. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 477–491. Springer, Heidelberg (2004)Google Scholar
  18. 18.
    Norguet, J.P., Zimányi, E., Steinberger, R.: Improving web sites with web usage mining, web content mining, and semantic analysis. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 430–439. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  19. 19.
    Steinberger, R., Pouliquen, B., Ignat, C.: Exploiting multilingual nomenclatures and language-independent text features as an interlingua for cross-lingual text analysis applications. In: Proc. B of the 7th Int. Multiconference on Language Technologies, IS 2004 (2004)Google Scholar
  20. 20.
    Maedche, A., Staab, S.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2), 72–79 (2001)CrossRefGoogle Scholar
  21. 21.
    Steinberger, R., Pouliquen, B., Ignat, C.: Navigating multilingual news collection using automatically extracted information. In: Proc. of the 27th Int. Conf. on Information Technology Interfaces, ITI (2005)Google Scholar
  22. 22.
    Lozano-Tello, A., Gómez-Pérez, A.: ONTOMETRIC: A method to choose the appropriate ontology. J. of Database Manag. 15(2), 1–18 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Jean-Pierre Norguet
    • 1
  • Esteban Zimányi
    • 1
  • Ralf Steinberger
    • 2
  1. 1.Laboratory of Computer and Network EngineeringUniversité Libre de BruxellesBrusselsBelgium
  2. 2.European Commission – Joint Research CentreIspra (VA)Italy

Personalised recommendations