Information Systems and e-Business Management

, Volume 11, Issue 4, pp 569–595 | Cite as

Using statistics, visualization and data mining for monitoring the quality of meta-data in web portals

  • Marcos Aurélio DominguesEmail author
  • Carlos Soares
  • Alípio Mário Jorge
Original Article


The goal of many web portals is to select, organize and distribute content in order to satisfy its users/customers. This process is usually based on meta-data that represent and describe content. In this paper we describe a methodology and a system to monitor the quality of the meta-data used to describe content in web portals. The methodology is based on the analysis of the meta-data using statistics, visualization and data mining tools. The methodology enables the site’s editor to detect and correct problems in the description of contents, thus improving the quality of the web portal and the satisfaction of its users. We also define a general architecture for a system to support the proposed methodology. We have implemented this system and tested it on a Portuguese portal for management executives. The results validate the methodology proposed.


Web data analysis Meta-data quality Quality of process Content management Web portals 



This work was partially funded by PortalExecutivo. The authors are grateful to PortalExecutivo for their support, and, in particular, to Rui Brandão and Carlos Sampaio for their collaboration.


  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th international conference on very large data bases, pp 487–499. Accessed 30 Nov 2009
  2. Baeza-Yates R, Rello L (2012) On measuring the lexical quality of the web. In: Proceedings of the 2012 Joint WICOW/AIRWeb workshop on web quality (WebQuality 2012), pp 1–6. doi: 10.1145/2184305.2184307
  3. Berendt B (2002) Using site semantics to analyze, visualize, and support navigation. Data Min Knowl Discov 6(1):37–59. doi: 10.1023/A:1013280719795 CrossRefGoogle Scholar
  4. Blanco L, Crescenzi V, Merialdo P, Papotti P (2011) Characterizing the uncertainty of web data: models and experiences. In: Proceedings of the 2011 joint WICOW/AIRWeb workshop on web quality (WebQuality 2011), pp 1–8. doi: 10.1145/1964114.1964116
  5. Bruce TR, Hillmann D (2004) The continuum of metadata quality: defining, expressing, exploiting. American Library Association, ChicagoGoogle Scholar
  6. Cadez I, Heckerman D, Meek C, Smyth P, White S (2003) Model-based clustering and visualization of navigation patterns on a web site. Data Min Knowl Discov 7(4):399–424. doi: 10.1023/A:1024992613384 CrossRefGoogle Scholar
  7. Carneiro A (2008) Using web data for measuring the effectiveness of an e-commerce site. Master’s thesis, University of Porto, Faculty of Economics, PortugalGoogle Scholar
  8. Cleverdon CW, Mills J, Keen M (1966) Aslib cranfield research project—factors determining the performance of indexing systems; volume 1, design; part 1, text. Tech. rep., Cranfield University. Accessed 30 Nov 2009
  9. Das R, Turkoglu I (2009) Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Exp Syst Appl Int J 36:6635–6644. doi: 10.1016/j.eswa.2008.08.067 CrossRefGoogle Scholar
  10. Domingues MA (2008) An independent platform for the monitoring, analysis and adaptation of web sites. In: Pu P, Bridge DG, Mobasher B, Ricci F (eds) Proceedings of the 2008 ACM conference on recommender systems, RecSys 2008, Lausanne, Switzerland, October 23–25, 2008, pp 299–302Google Scholar
  11. Domingues MA, Soares C, Jorge AM (2006) A web-based system to monitor the quality of meta-data in web portals. In: WI-IATW ’06: proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence and intelligent agent technology, IEEE Computer Society, Hong-Kong, China, pp 188–191. doi: 10.1109/WI-IATW.2006.24
  12. Domingues MA, Jorge AM, Soares C, Leal JP, Machado P (2007) A data warehouse for web intelligence. In: Proceedings of the 13th Portuguese conference on artificial intelligence, pp 487–499Google Scholar
  13. Domingues MA, Jorge AM, Soares C, Leal JP, Machado P (2008) A platform to support web site adaptation and monitoring of its effects: a case study. In: Proceedings of the 6th workshop on intelligent techniques for web personalization and recommender systems (ITWP 2008), Chicago, Illinois, pp 29–36Google Scholar
  14. Fluit C, Wester J (2002) Using visualization for information management tasks. In: International conference on information visualisationGoogle Scholar
  15. Guy M, Powell A, Day M (2004) Improving the quality of metadata in eprint archives. Ariadne (38). Accessed 11 Sept 2012
  16. Isinkaye FO, Robert ABC, Ojokoh BA (2012) An evaluation of metadata integrity in textual documents. J Libr Metadata 12(1):1–14. doi: 10.1080/19386389.2012.652565 CrossRefGoogle Scholar
  17. Lex E, Voelske M, Errecalde M, Ferretti E, Cagnina L, Horn C, Stein B, Granitzer M (2012) Measuring the quality of web content using factual information. In: Proceedings of the 2012 joint WICOW/AIRWeb workshop on web quality (WebQuality 2012), pp 7–10. doi: 10.1145/2184305.2184308
  18. Malinowski E, Zimnyi E (2008) Advanced data warehouse design: from conventional to spatial and temporal applications (Data-Centric Systems and Applications). Springer Publishing Company, IncorporatedGoogle Scholar
  19. Moorsel AV (2001) Metrics for the internet age: quality of experience and quality of business, fifth performability workshop. Tech. rep., Software Technology Laboratory—HP Laboratories Palo Alto. Accessed 20 Nov 2006
  20. Nichols DM, Chan CH, Bainbridge D, McKay D, Twidale MB (2008) A lightweight metadata quality tool. In: Proceedings of the 8th ACM/IEEE-CS joint conference on digital libraries (JCDL 2008), pp 385–388. doi: 10.1145/1378889.1378957
  21. Ochoa X, Duval E (2006) Towards automatic evaluation of learning object metadata quality. In: Proceedings of the 2006 international conference on advances in conceptual modeling: theory and practice. Springer, Berlin, Heidelberg, pp 372–381. doi: 10.1007/11908883_44
  22. Ochoa X, Duval E (2009) Automatic evaluation of metadata quality in digital repositories. Int J Digit Libr 10(2–3):67–91. doi: 10.1007/s00799-009-0054-4 CrossRefGoogle Scholar
  23. Park JR (2009) Metadata quality in digital repositories: a survey of the current state of the art. Catalog Class Q 47(3–4):213–228. doi: 10.1080/01639370902737240 CrossRefGoogle Scholar
  24. Pipino L L, Lee YW, Wang RY (2002) Data quality assessment. Commun ACM 45(4):211–218CrossRefGoogle Scholar
  25. Rijsbergen CJV (1979) Information retrieval. Butterworth-Heinemann, Newton, MA, USAGoogle Scholar
  26. Soares C, Jorge AM, Domingues MA (2005) Monitoring the quality of meta-data in web portals using statistics, visualization and data mining. In: Proceedings of Twelfth Portuguese conference on artificial intelligence (EPIA 2005), LNAI 3808, Covilhã, Portugal, pp 371–382Google Scholar
  27. Spiliopoulou M, Pohle C (2001) Data mining for measuring and improving the success of web sites. Data Min Knowl Discov 5(1–2):85–114CrossRefGoogle Scholar
  28. Stvilia B, Gasser L, Twidale MB, Shreeves SL, Cole TW (2004) Metadata quality for federated collections. In: 9th international conference on information quality (IQ 2004), pp 111–125Google Scholar
  29. Velasquez JD, Palade V (2008) Adaptive web sites: a knowledge extraction from web data approach—volume 170 frontiers in artificial intelligence and applications. IOS Press, Amsterdam, The NetherlandsGoogle Scholar
  30. Vuong BQ, Lim EP, Sun A, Chang CH, Chatterjea K, Goh DHL, Theng YL, Zhang J (2007) Key element-context model: an approach to efficient web metadata maintenance. In: ECDL’07: Proceedings of the 11th European conference on research and advanced technology for digital libraries, Springer, Berlin, Heidelberg, pp 63–74. doi: 10.1007/978-3-540-74851-9_6
  31. Zaïane OR, Xin M, Han J (1998) Discovering web access patterns and trends by applying olap and data mining technology on web logs. In: Proceedings of the advances in digital libraries conference (ADL-1998), IEEE Computer Society, Washington, DC, USA, pp 19–29Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Marcos Aurélio Domingues
    • 1
    Email author
  • Carlos Soares
    • 2
  • Alípio Mário Jorge
    • 3
  1. 1.INESC TEC, INESC Technology and Science (formerly INESC Porto)PortoPortugal
  2. 2.INESC TEC (formerly INESC Porto) and Faculty of EconomicsUniversity of PortoPortoPortugal
  3. 3.LIAAD/INESC TEC and Faculty of SciencesUniversity of PortoPortoPortugal

Personalised recommendations