Journal of Intelligent Information Systems

, Volume 37, Issue 3, pp 371–395 | Cite as

SEWEBAR-CMS: semantic analytical report authoring for data mining results

  • Tomáš Kliegr
  • Vojtěch Svátek
  • Martin Ralbovský
  • Milan Šimůnek
Article

Abstract

SEWEBAR-CMS is a set of extensions for the Joomla! Content Management System (CMS) that extends it with functionality required to serve as a communication platform between the data analyst, domain expert and the report user. SEWEBAR-CMS integrates with existing data mining software through PMML. Background knowledge is entered via a web-based elicitation interface and is preserved in documents conforming to the proposed Background Knowledge Exchange Format (BKEF) specification. SEWEBAR-CMS offers web service integration with semantic knowledge bases, into which PMML and BKEF data are stored. Combining domain knowledge and mining model visualizations with results of queries against the knowledge base, the data analyst conveys the results of the mining through a semi-automatically generated textual analytical report to the end user. The paper demonstrates the use of SEWEBAR-CMS on a real-world task from the cardiological domain and presents a user study showing that the proposed report authoring support leads to a statistically significant decrease in the time needed to author the analytical report.

Keywords

Data mining Association rules Background knowledge Semantic web Content management systems Topic maps 

References

  1. Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD (Vol. 22, No. 2, pp. 207–16). Washington, D.C.Google Scholar
  2. Almuallim, H., Akiba, Y. A., & Kaneda, S. (2005). On handling tree-structured attributes in decision tree learning. In Proceedings of ICML 2005 (pp. 12–20). Morgan Kaufmann.Google Scholar
  3. Amato, G., Gennaro, C., Savino, P., & Rabitti, F. (2005). Functionalities of a content management system specialised for digital library applications. In Proceedings of AVIVDiLib’05—7th international workshop of the EU NoE DELOS on audio-visual content and information visualisation in digital libraries. Cortona, Italy, 4–6 May 2005.Google Scholar
  4. Antunes, C. (2009). Mining patterns in the presence of domain knowledge. In Proceedings of ICEIS (2) 2009 (pp. 188–193). Milan, Italy.Google Scholar
  5. Aronis, J. M., Provost, F. J., & Buchanan, B. G. (1996). Exploiting background knowledge in automated discovery. In Proceedings of SIGKDD-96 (pp. 355–358). Portland, Oregon.Google Scholar
  6. Atzmueller, M., & Puppe, F. (2009). A knowledge-intensive approach for semi-automatic causal subgroup discovery. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220, pp. 19–36). Springer.Google Scholar
  7. Atzmueller, M., Lemmerich, F., Reutelshoefer, J., & Puppe, J. (2009). Wiki-enabled semantic data mining—task design, evaluation and refinement. In Proceedings of DERIS2009—design, evaluation and refinement of intelligent systems. Krakow, Poland, 28 November 2009, http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-545/.
  8. Balhar, J., Kliegr, T., Štastný, D., & Vojíř, S. (2010). Elicitation of background knowledge for data mining. In Proceedings of Znalosti 2010, Jindrichuv Hradec (pp. 283–286). Prague: Oeconomica.Google Scholar
  9. Bernstein, A., Provost, F., & Hill, S. (2005). Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 17(4), 503–518.CrossRefGoogle Scholar
  10. Clark, P., & Matwin, S. (1993). Using qualitative models to guide inductive learning. In Proceedings of the 1993 international conference on machine learning (pp. 49–56). Amherst, MA.Google Scholar
  11. Coulet, A., Smaïl-Tabbone M., Benlian, P., Napoli, A., & Devignes, M.-D. (2008). Ontology-guided data preparation for discovering genotype-phenotype relationships. BMC Bioinformatics, 9 (Suppl 4), S3.CrossRefGoogle Scholar
  12. Domingues, M. A., & Rezende, S. O. (2005). Using taxonomies to facilitate the analysis of the association rules. In Proceedings of KDO’05—2nd int’l workshop on knowledge discovery and ontologies, at ECML/PKDD (pp. 59-66). Porto.Google Scholar
  13. Engels, R., Lindner, G., & Studer, R. (1998). Providing user support for developing knowledge discovery applications; a midterm report. In S. Wrobel (Ed.), Themenheft der Künstliche intelligenz (No. 1, pp. 38–39).Google Scholar
  14. Euzenat, J., & Shvaiko, P. (2007). Ontology matching. Heidelberg: Springer-Verlag.MATHGoogle Scholar
  15. Garshol, L. M. (2006). Tolog—A topic maps query language. In Proceedings of first international workshop on topic maps research and applications—TMRA 2006. LNCS (Vol. 3873). Leipzig: Springer.Google Scholar
  16. Garshol, L. M. (2007). TMRAP—Topic maps remote access protocol. In Proceedings of topic maps research and applications—TMRA 2006. LNAI (Vol. 4438). Leipzig: Springer.Google Scholar
  17. Garshol, L. M., & Moore, G. (2006). Topic Maps—XML Syntax. ISO/IEC JTC1/SC34. http://www.isotopicmaps.org/sam/sam-xtm/.
  18. Guazzelli, A., Lin, W. L., & Jena, T. (2010). Unleashing the power of open standards for data mining and predictive analytics. CreateSpace. Lexington, KY.Google Scholar
  19. Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation (Mathematical Foundations for a General Theory). Springer-Verlag.Google Scholar
  20. Hazucha, A., Balhar, J., & Kliegr, T. (2010). A PHP library for Ontopia-CMS integration. In TMRA 2010. University of Leipzig, Leipzig, September 29- October 1, 2010.Google Scholar
  21. Kliegr, T., Ovečka M., & Zemánek, J. (2009a). Topic maps for association rule mining. In Proceedings of topic maps research and applications—TMRA 2009. Leipziger Beitrage zur Informatik, Band XIX, 11–13 November 2009.Google Scholar
  22. Kliegr, T., Ralbovský, M., Svátek, V, Šimůnek, M., Jirkovský, V., Nemrava, J., et al. (2009b). Semantic analytical reports: A framework for post-processing data mining results. In Foundations of intelligent systems (ISMIS’09). LNCS (pp. 88–98). Prague: Springer, 14–17 September 2009.Google Scholar
  23. Kliegr, T., & Rauch, J. (2010). An XML format for association rule models based on GUHA method. In Proc. RuleML-2010, 4th international web rule symposium. LNCS. Washington: Springer.Google Scholar
  24. Kliegr, T., Svátek, V, Šimůnek, M., Stastný, D., & Hazucha, A. (2010). An XML schema and a topic map ontology for formalization of background knowledge in data mining. In IRMLeS-2010, 2nd ESWC workshop on inductive reasoning and machine learning for the semantic web. Heraklion, Crete, Greece. Online: http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-611/.
  25. Kopanas, I., Avouris, N. M., & Daskalaki, S. (2002). The role of domain knowledge in a large scale data mining project. In Methods and applications of artificial intelligence. LNCS (Vol. 2308, pp. 288–299). Springer.Google Scholar
  26. Kuo, Y.-T., Lonie, A., Sonenberg, L., & Paizis, K. (2007). Domain ontology driven data mining: A medical case study. In Proceedings of the 2007 international workshop on domain driven data mining at KDD’07. San Jose, California, 12–15 August 2007.Google Scholar
  27. Nazeri, Z., & Bloedorn, E. (2004). Exploiting available domain knowledge to improve mining aviation safety and network security data. In: Proceedings of KDO-2004—workshop knowledge discovery and ontologies at ECML/PKDD 2004. Pisa, Italy.Google Scholar
  28. Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning, 6, 231–250.Google Scholar
  29. Olaru, A., Marinica, C., & Guillet, F. (2009). Local mining of Association Rules with Rule Schemas. In CIDM 2009—symposium on computational intelligence and data mining (pp. 118–124). Nashville, TN, March 30–April 2 2009. http://www.claudiamarinica.com/pdf/CIDM2009.pdf.
  30. OWL Web Ontology Language Overview. W3C Recommendation, 10 February 2004. http://www.w3.org/TR/owl-features/.
  31. Phillips, J., & Buchanan, B. G. (2001). Ontology-guided knowledge discovery in databases. In Proceedings of the 1st international conference on knowledge capture (pp. 123–130). Victoria, Canada.Google Scholar
  32. Podpečan, V., Lavrač, N., Kok, J. N., & de Bruin, J. (Eds.) (2009). SoKD’09’—third generation data mining: Towards service-oriented knowledge discovery. Slovenia, 7 September 2009.Google Scholar
  33. Rauch, J. (2005). Logic of association rules. Applied Intelligence, 22, 9–28.MATHCrossRefGoogle Scholar
  34. Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining. In Advances in data management. Studies in computational intelligence (Vol. 223). Springer.Google Scholar
  35. Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the SEWEBAR project. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220). Springer.Google Scholar
  36. Rauch, J., & Šimůnek, M. (2005). Alternative approach to mining association rules. In T. Y. Lin, S. Ohsuga, C. J. Liau & S. Tsumoto (Eds.), Data mining: Foundations, methods, and applications. Springer-Verlag.Google Scholar
  37. Rauch, J., & Šimůnek, M. (2007). Semantic web presentation of analytical reports from data mining—preliminary considerations. In: Proceedings of web intelligence’07 (pp. 3–7). Silicon Valley: IEEE.Google Scholar
  38. Svátek, V. (1997). Exploiting value hierarchies in rule learning. In Proceedings of ECML’97—9th European conference on machine learning (pp. 108–117). Prague: Poster Papers.Google Scholar
  39. Suyama, A., & Yamaguchi, T. (1998). Specifying and learning inductive learning systems using ontologies. In Proceedings of AAAI’98 work. On the methodology of applying mach. learn (pp. 29–36). Madison, Wisconsin, July 26–30, 1998.Google Scholar
  40. Thomas, J., Laublet, P., & Ganascia, J. G. (1993). A machine learning tool designed for a model-based knowledge acquisition approach. In EKAW-93—European knowledge acquisition workshop. LNCS (No. 723, pp. 123–138). Toulouse and Caylus: Springer.Google Scholar
  41. Tomečková, M. (2004). Minimal data model of the cardiological patient—the selection of data. Cor et Vasa, 44(4), 123.Google Scholar
  42. Tseng, M.-C., Lin, W.-Y., & Jeng, R. (2007). Mining association rules with ontological information. In ICIC 2007—second international conference on innovative comp., inform. and control. Kumamoto, Japan.Google Scholar
  43. van Dompseler, H. J. H., & van Someren, M. W. (1994). Using models of problem solving bias in automated knowledge acquisition. In Proceedings of ECAI’94—European conference on artificial intelligence (pp. 503–507). Amsterdam.Google Scholar
  44. Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In Proceedings of ICML 2001 (pp. 577–584). Williamstown: Morgan Kaufmann.Google Scholar
  45. Zeman, M., Ralbovský, M., Svátek, V., & Rauch, J. (2009). Ontology-driven data preparation for association mining. In Proceedings of Znalosti 2009 (pp. 270–283). Brno, Czech Republic.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Tomáš Kliegr
    • 1
    • 2
  • Vojtěch Svátek
    • 1
  • Martin Ralbovský
    • 1
  • Milan Šimůnek
    • 1
  1. 1.Faculty of Informatics and StatisticsUniversity of Economics, PraguePraha 3Czech Republic
  2. 2.Multimedia and Vision Research Group, Queen MaryUniversity of LondonLondonUK

Personalised recommendations