Category-Based Audience Metrics for Web Site Content Improvement Using Ontologies and Page Classification

  • Jean-Pierre Norguet
  • Benjamin Tshibasu-Kabeya
  • Gianluca Bontempi
  • Esteban Zimányi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3999)


With the emergence of the World Wide Web, analyzing and improving Web communication has become essential to adapt the Web content to the visitors’ expectations. Web communication analysis is traditionally performed by Web analytics software, which produce long lists of page-based audience metrics. These results suffer from page synonymy, page polysemy, page temporality, and page volatility. In addition, the metrics contain little semantics and are too detailed to be exploited by organization managers and chief editors, who need summarized and conceptual information to take high-level decisions. To obtain such metrics, we propose to classify the Web site pages into categories representing the Web site topics and to aggregate the page hits accordingly. In this paper, we show how to compute and visualize these metrics using OLAP tools. To solve the page-temporality issue, we propose to classify the versions of the pages using automatic classifiers.


Organization Manager Word Sense Disambiguation Hierarchical Aggregation Content Journal Advance Information System Engineer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chakrabarti, S., Dom, B., Agrawal, R., Raghavan, P.: Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB J 7(3), 163–178 (1998)CrossRefGoogle Scholar
  2. 2.
    Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.E.: Using information scent to model user information needs and actions and the web. In: Proc. of the SIGCHI on Human Factors in Computing Systems, pp. 490–497 (2001)Google Scholar
  3. 3.
    Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data Knowl. Eng. 53(3), 225–241 (2005)CrossRefGoogle Scholar
  4. 4.
    Johan, H., Perrotta, D., Steinberger, R., Varfis, A.: Document classification and visualisation to support the investigation of suspected fraud. In: Proc. of the 4th European Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD (2000)Google Scholar
  5. 5.
    Malinowski, E., Zimányi, E.: OLAP hierarchies: A conceptual perspective. In: Persson, A., Stirna, J. (eds.) CAiSE 2004. LNCS, vol. 3084, pp. 477–491. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    March, J.G., Simon, H.A., Guetzkow, H.S.: Organizations, 2nd edn. Blackwell, Malden (1983)Google Scholar
  7. 7.
    Mitchell, T.M.: Machine Learning. McGraw-Hill Higher Education, New York (1997)zbMATHGoogle Scholar
  8. 8.
    Norguet, J.-P., Zimányi, E., Steinberger, R.: Improving web sites with web usage mining, web content mining, and semantic analysis. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 430–439. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Ráez, A.M., López, L.A.U., Steinberger, R.: Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS, vol. 3230, pp. 1–12. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  10. 10.
    Ríos, S.A., Velásquez, J.D., Vera, E.S., Yasuda, H., Aoki, T.: Using SOFM to improve web site text content. In: Wang, L., Chen, K., S. Ong, Y. (eds.) ICNC 2005. LNCS, vol. 3611, Part ll, pp. 622–626. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    Rohatgi, V.K.: An Introduction to Probability Theory and Mathematical Statistics. John Wiley & Sons, Chichester (1976)zbMATHGoogle Scholar
  12. 12.
    Sanderson, M.: Word sense disambiguation and information retrieval. In: Proc. of the 17th Int. Conf. on R&D in IR, SIGIR, pp. 142–150 (1994)Google Scholar
  13. 13.
    Srivastava, J., Cooley, R., Deshpande, M., Pang-Ning, T.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2) (2000)Google Scholar
  14. 14.
    Stumme, G., Maedche, A.: FCA-MERGE: Bottom-up merging of ontologies. In: Proc. of the 17th Int. Joint Conf. on Artificial Intelligence, IJCAI, pp. 225–234 (2001)Google Scholar
  15. 15.
    Wahli, U., Norguet, J.P., Andersen, J., Hargrove, N., Meser, M.: Websphere Version 5 Application Development Handbook. IBM Press (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jean-Pierre Norguet
    • 1
  • Benjamin Tshibasu-Kabeya
    • 2
  • Gianluca Bontempi
    • 2
  • Esteban Zimányi
    • 1
  1. 1.Department of Computer & Network EngineeringUniversité Libre de BruxellesBrusselsBelgium
  2. 2.Machine Learning Group, Département d’InformatiqueUniversité Libre de BruxellesBrusselsBelgium

Personalised recommendations