SEWEBAR-CMS: semantic analytical report authoring for data mining results
SEWEBAR-CMS is a set of extensions for the Joomla! Content Management System (CMS) that extends it with functionality required to serve as a communication platform between the data analyst, domain expert and the report user. SEWEBAR-CMS integrates with existing data mining software through PMML. Background knowledge is entered via a web-based elicitation interface and is preserved in documents conforming to the proposed Background Knowledge Exchange Format (BKEF) specification. SEWEBAR-CMS offers web service integration with semantic knowledge bases, into which PMML and BKEF data are stored. Combining domain knowledge and mining model visualizations with results of queries against the knowledge base, the data analyst conveys the results of the mining through a semi-automatically generated textual analytical report to the end user. The paper demonstrates the use of SEWEBAR-CMS on a real-world task from the cardiological domain and presents a user study showing that the proposed report authoring support leads to a statistically significant decrease in the time needed to author the analytical report.
KeywordsData mining Association rules Background knowledge Semantic web Content management systems Topic maps
The work described here has been supported by Grant No. ME913 of Ministry of Education, Youth and Sports, of the Czech Republic, and by Grant No. 201/08/0802 of the Czech Science Foundation, and by Grant No. IGA 21/08 of the University of Economics, Prague. We would like to thank Marie Tomečková, who gave us a valuable feedback on the expert elicitation interface, and the following colleagues who significantly contributed to SEWEBAR-CMS: Jakub Balhar, Daniel Štastný, Vojtěch Jirkovský, Jan Nemrava, Stanislav Vojíř and Jan Zemánek. Last, but no least, we would like to thank teachers at the University of Economics, Prague, who devoted their time to the evaluation of the framework in the educational context.
- Agrawal, R., Imielinski, T., & Swami, A. N. (1993). Mining association rules between sets of items in large databases. In SIGMOD (Vol. 22, No. 2, pp. 207–16). Washington, D.C.Google Scholar
- Almuallim, H., Akiba, Y. A., & Kaneda, S. (2005). On handling tree-structured attributes in decision tree learning. In Proceedings of ICML 2005 (pp. 12–20). Morgan Kaufmann.Google Scholar
- Amato, G., Gennaro, C., Savino, P., & Rabitti, F. (2005). Functionalities of a content management system specialised for digital library applications. In Proceedings of AVIVDiLib’05—7th international workshop of the EU NoE DELOS on audio-visual content and information visualisation in digital libraries. Cortona, Italy, 4–6 May 2005.Google Scholar
- Antunes, C. (2009). Mining patterns in the presence of domain knowledge. In Proceedings of ICEIS (2) 2009 (pp. 188–193). Milan, Italy.Google Scholar
- Aronis, J. M., Provost, F. J., & Buchanan, B. G. (1996). Exploiting background knowledge in automated discovery. In Proceedings of SIGKDD-96 (pp. 355–358). Portland, Oregon.Google Scholar
- Atzmueller, M., & Puppe, F. (2009). A knowledge-intensive approach for semi-automatic causal subgroup discovery. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220, pp. 19–36). Springer.Google Scholar
- Atzmueller, M., Lemmerich, F., Reutelshoefer, J., & Puppe, J. (2009). Wiki-enabled semantic data mining—task design, evaluation and refinement. In Proceedings of DERIS2009—design, evaluation and refinement of intelligent systems. Krakow, Poland, 28 November 2009, http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-545/.
- Balhar, J., Kliegr, T., Štastný, D., & Vojíř, S. (2010). Elicitation of background knowledge for data mining. In Proceedings of Znalosti 2010, Jindrichuv Hradec (pp. 283–286). Prague: Oeconomica.Google Scholar
- Clark, P., & Matwin, S. (1993). Using qualitative models to guide inductive learning. In Proceedings of the 1993 international conference on machine learning (pp. 49–56). Amherst, MA.Google Scholar
- Domingues, M. A., & Rezende, S. O. (2005). Using taxonomies to facilitate the analysis of the association rules. In Proceedings of KDO’05—2nd int’l workshop on knowledge discovery and ontologies, at ECML/PKDD (pp. 59-66). Porto.Google Scholar
- Engels, R., Lindner, G., & Studer, R. (1998). Providing user support for developing knowledge discovery applications; a midterm report. In S. Wrobel (Ed.), Themenheft der Künstliche intelligenz (No. 1, pp. 38–39).Google Scholar
- Garshol, L. M. (2006). Tolog—A topic maps query language. In Proceedings of first international workshop on topic maps research and applications—TMRA 2006. LNCS (Vol. 3873). Leipzig: Springer.Google Scholar
- Garshol, L. M. (2007). TMRAP—Topic maps remote access protocol. In Proceedings of topic maps research and applications—TMRA 2006. LNAI (Vol. 4438). Leipzig: Springer.Google Scholar
- Garshol, L. M., & Moore, G. (2006). Topic Maps—XML Syntax. ISO/IEC JTC1/SC34. http://www.isotopicmaps.org/sam/sam-xtm/.
- Guazzelli, A., Lin, W. L., & Jena, T. (2010). Unleashing the power of open standards for data mining and predictive analytics. CreateSpace. Lexington, KY.Google Scholar
- Hájek, P., & Havránek, T. (1978). Mechanizing hypothesis formation (Mathematical Foundations for a General Theory). Springer-Verlag.Google Scholar
- Hazucha, A., Balhar, J., & Kliegr, T. (2010). A PHP library for Ontopia-CMS integration. In TMRA 2010. University of Leipzig, Leipzig, September 29- October 1, 2010.Google Scholar
- Kliegr, T., Ovečka M., & Zemánek, J. (2009a). Topic maps for association rule mining. In Proceedings of topic maps research and applications—TMRA 2009. Leipziger Beitrage zur Informatik, Band XIX, 11–13 November 2009.Google Scholar
- Kliegr, T., Ralbovský, M., Svátek, V, Šimůnek, M., Jirkovský, V., Nemrava, J., et al. (2009b). Semantic analytical reports: A framework for post-processing data mining results. In Foundations of intelligent systems (ISMIS’09). LNCS (pp. 88–98). Prague: Springer, 14–17 September 2009.Google Scholar
- Kliegr, T., & Rauch, J. (2010). An XML format for association rule models based on GUHA method. In Proc. RuleML-2010, 4th international web rule symposium. LNCS. Washington: Springer.Google Scholar
- Kliegr, T., Svátek, V, Šimůnek, M., Stastný, D., & Hazucha, A. (2010). An XML schema and a topic map ontology for formalization of background knowledge in data mining. In IRMLeS-2010, 2nd ESWC workshop on inductive reasoning and machine learning for the semantic web. Heraklion, Crete, Greece. Online: http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-611/.
- Kopanas, I., Avouris, N. M., & Daskalaki, S. (2002). The role of domain knowledge in a large scale data mining project. In Methods and applications of artificial intelligence. LNCS (Vol. 2308, pp. 288–299). Springer.Google Scholar
- Kuo, Y.-T., Lonie, A., Sonenberg, L., & Paizis, K. (2007). Domain ontology driven data mining: A medical case study. In Proceedings of the 2007 international workshop on domain driven data mining at KDD’07. San Jose, California, 12–15 August 2007.Google Scholar
- Nazeri, Z., & Bloedorn, E. (2004). Exploiting available domain knowledge to improve mining aviation safety and network security data. In: Proceedings of KDO-2004—workshop knowledge discovery and ontologies at ECML/PKDD 2004. Pisa, Italy.Google Scholar
- Nunez, M. (1991). The use of background knowledge in decision tree induction. Machine Learning, 6, 231–250.Google Scholar
- Olaru, A., Marinica, C., & Guillet, F. (2009). Local mining of Association Rules with Rule Schemas. In CIDM 2009—symposium on computational intelligence and data mining (pp. 118–124). Nashville, TN, March 30–April 2 2009. http://www.claudiamarinica.com/pdf/CIDM2009.pdf.
- OWL Web Ontology Language Overview. W3C Recommendation, 10 February 2004. http://www.w3.org/TR/owl-features/.
- Phillips, J., & Buchanan, B. G. (2001). Ontology-guided knowledge discovery in databases. In Proceedings of the 1st international conference on knowledge capture (pp. 123–130). Victoria, Canada.Google Scholar
- Podpečan, V., Lavrač, N., Kok, J. N., & de Bruin, J. (Eds.) (2009). SoKD’09’—third generation data mining: Towards service-oriented knowledge discovery. Slovenia, 7 September 2009.Google Scholar
- Rauch, J. (2009). Considerations on logical calculi for dealing with knowledge in data mining. In Advances in data management. Studies in computational intelligence (Vol. 223). Springer.Google Scholar
- Rauch, J., & Šimůnek, M. (2009). Dealing with background knowledge in the SEWEBAR project. In Knowledge discovery enhanced with semantic and social information. Studies in computational intelligence (Vol. 220). Springer.Google Scholar
- Rauch, J., & Šimůnek, M. (2005). Alternative approach to mining association rules. In T. Y. Lin, S. Ohsuga, C. J. Liau & S. Tsumoto (Eds.), Data mining: Foundations, methods, and applications. Springer-Verlag.Google Scholar
- Rauch, J., & Šimůnek, M. (2007). Semantic web presentation of analytical reports from data mining—preliminary considerations. In: Proceedings of web intelligence’07 (pp. 3–7). Silicon Valley: IEEE.Google Scholar
- Svátek, V. (1997). Exploiting value hierarchies in rule learning. In Proceedings of ECML’97—9th European conference on machine learning (pp. 108–117). Prague: Poster Papers.Google Scholar
- Suyama, A., & Yamaguchi, T. (1998). Specifying and learning inductive learning systems using ontologies. In Proceedings of AAAI’98 work. On the methodology of applying mach. learn (pp. 29–36). Madison, Wisconsin, July 26–30, 1998.Google Scholar
- Thomas, J., Laublet, P., & Ganascia, J. G. (1993). A machine learning tool designed for a model-based knowledge acquisition approach. In EKAW-93—European knowledge acquisition workshop. LNCS (No. 723, pp. 123–138). Toulouse and Caylus: Springer.Google Scholar
- Tomečková, M. (2004). Minimal data model of the cardiological patient—the selection of data. Cor et Vasa, 44(4), 123.Google Scholar
- Tseng, M.-C., Lin, W.-Y., & Jeng, R. (2007). Mining association rules with ontological information. In ICIC 2007—second international conference on innovative comp., inform. and control. Kumamoto, Japan.Google Scholar
- van Dompseler, H. J. H., & van Someren, M. W. (1994). Using models of problem solving bias in automated knowledge acquisition. In Proceedings of ECAI’94—European conference on artificial intelligence (pp. 503–507). Amsterdam.Google Scholar
- Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-means clustering with background knowledge. In Proceedings of ICML 2001 (pp. 577–584). Williamstown: Morgan Kaufmann.Google Scholar
- Zeman, M., Ralbovský, M., Svátek, V., & Rauch, J. (2009). Ontology-driven data preparation for association mining. In Proceedings of Znalosti 2009 (pp. 270–283). Brno, Czech Republic.Google Scholar