“Semantics Inside!” But Let’s Not Tell the Data Miners: Intelligent Support for Data Mining

  • Jörg-Uwe Kietz
  • Floarea Serban
  • Simon Fischer
  • Abraham Bernstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8465)


Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates successful data analysis.

To help Data Miners we enhanced one of the most used open source data mining tools—RapidMiner—with semantic technologies. Specifically, we first annotated all elements involved in the Data Mining (DM) process—the data, the operators, models, data mining tasks, and KDD workflows—semantically using our eProPlan modelling tool that allows to describe operators and build a task/method decomposition grammar to specify the desired workflows embedded in an ontology. Second, we enhanced RapidMiner to employ these semantic annotations to actively support data analysts. Third, we built an Intelligent Discovery Assistant, eIda, that leverages the semantic annotation as well as HTN planning to automatically support KDD process generation.

We found that the use of Semantic Web approaches and technologies in the KDD domain helped us to lower the barrier to data analysis. We also found that using a generic ontology editor overwhelmed KDD-centric users. We, therefore, provided them with problem-centric extensions to Protégé. Last and most surprising, we found that our semantic modeling of the KDD domain served as a rapid prototyping approach for several hard-coded improvements of RapidMiner, namely correctness checking of workflows and quick-fixes, reinforcing the finding that even a little semantic modeling can go a long way in improving the understanding of a domain even for domain experts.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ankolekar, A., et al.: DAML-S: Web service description for the semantic web. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, pp. 348–363. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Bangor, A., Kortum, P., Miller, J.: Determining what individual SUS scores mean: adding an adjective rating scale. Journal of Usability Studies 4(3), 114–123 (2009)Google Scholar
  3. 3.
    Benatallah, B., Dumas, M., Fauvet, M.-C., Rabhi, F.A.: Towards patterns of web services composition. Springer (2003)Google Scholar
  4. 4.
    Bernstein, A., Provost, F.J., Hill, S.: Toward intelligent assistance for a data mining process: An ontology-based approach for cost-sensitive classification. IEEE Trans. Knowl. Data Eng. 17(4), 503–518 (2005)CrossRefGoogle Scholar
  5. 5.
    Claro, D.B., Albers, P., Hao, J.-K.: Web services composition. In: Semantic Web Services, Processes and Applications, pp. 195–225. Springer (2006)Google Scholar
  6. 6.
    Diamantini, C., Potena, D., Storti, E.: Kddonto: An ontology for discovery and composition of kdd algorithms. In: Service-Oriented Knowledge Discovery (SoKD 2009) Workshop at ECML/PKDD 2009, pp. 13–24 (2009)Google Scholar
  7. 7.
    Erol, K.: Hierarchical task network planning: formalization, analysis, and implementation. PhD thesis, University of Maryland at College Park, College Park, MD, USA, UMI Order No. GAX96-22054 (1996)Google Scholar
  8. 8.
    Funk, A., Tablan, V., Bontcheva, K., Cunningham, H., Davis, B., Handschuh, S.: Clone: Controlled language for ontology editing. In: Aberer, K., et al. (eds.) ISWC/ASWC 2007. LNCS, vol. 4825, pp. 142–155. Springer, Heidelberg (2007)Google Scholar
  9. 9.
    Goble, C., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucl. Acids Res. (2010)Google Scholar
  10. 10.
    Kietz, J., Serban, F., Bernstein, A.: eProPlan: A Tool to Model Automatic Generation of Data Mining Workflows. In: 3rd Planning to Learn Workshop at ECAI 2010, vol. 15 (2010)Google Scholar
  11. 11.
    Kietz, J., Serban, F., Bernstein, A.: Designing kdd-workflows via htn-planning. In: Vanschoren, J., Brazdil, P., Kietz, J.-U. (eds.) 4rd Planning to Learn Workshop at ECAI 2012. CEUR Workshop Proceedings, vol. 950 (2012)Google Scholar
  12. 12.
    Kietz, J., Serban, F., Bernstein, A., Fischer, S.: Towards cooperative planning of data mining workflows. In: Proceedings of the ECML-PKDD 2009 Workshop on Service-Oriented Knowledge Discovery, pp. 1–12 (2009)Google Scholar
  13. 13.
    Kohavi, R., Brodley, C.E., Frasca, B., Mason, L., Zheng, Z.: Kdd-cup 2000 organizers’ report: peeling the onion. SIGKDD Explor. Newsl. 2, 86–93 (2000)CrossRefGoogle Scholar
  14. 14.
    Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: Yale: Rapid prototyping for complex data mining tasks. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–940. ACM (2006)Google Scholar
  15. 15.
    Milanovic, N., Malek, M.: Current solutions for web service composition. IEEE Internet Computing 8(6), 51–59 (2004)CrossRefGoogle Scholar
  16. 16.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  17. 17.
    Patel-Schneider, P., Hayes, P., Horrocks, I., et al.: OWL web ontology language semantics and abstract syntax. W3C Recommendation 10 (2004)Google Scholar
  18. 18.
    Rao, J., Su, X.: A survey of automated web service composition methods. In: Cardoso, J., Sheth, A.P. (eds.) SWSWPC 2004. LNCS, vol. 3387, pp. 43–54. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  19. 19.
    Sabou, M., Richards, D., Van Splunter, S.: An experience report on using daml-s. In: The Proceedings of the Twelfth International World Wide Web Conference Workshop on E-Services and the Semantic Web (ESSW 2003), Budapest (2003)Google Scholar
  20. 20.
    Serban, F.: Towards Effective Support for Data Mining using Intelligent Discovery Assistance. PhD thesis, University of Zurich, Department of Informatics (2013)Google Scholar
  21. 21.
    Serban, F., Vanschoren, J., Kietz, J.-U., Bernstein, A.: A survey of intelligent assistants for data analysis. ACM Computing Surveys (2013) (forthcoming:Epub ahead of print)Google Scholar
  22. 22.
    Sirin, E., Parsia, B.: SPARQL-DL: SPARQL query for OWL-DL. In: Proceedings of the International Workshop on OWL Experiences and Directions (OWLED) (2007)Google Scholar
  23. 23.
    Sirin, E., Parsia, B., Grau, B., Kalyanpur, A., Katz, Y.: Pellet: A practical owl-dl reasoner. Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 51–53 (2007)CrossRefGoogle Scholar
  24. 24.
    Sirin, E., Parsia, B., Wu, D., Hendler, J., Nau, D.: Htn planning for web service composition using shop2. Web Semantics: Science, Services and Agents on the World Wide Web 1(4), 377–396 (2004)CrossRefGoogle Scholar
  25. 25.
    Thiébaux, S., Hoffmann, J., Nebel, B.: In defense of pddl axioms. Artif. Intell. 168(1), 38–69 (2005)CrossRefzbMATHGoogle Scholar
  26. 26.
    Žáková, M., Křemen, P., Železný, F., Lavrač, N.: Automatic knowledge discovery workflow composition through ontology-based planning. IEEE Transactions on Automation Science and Engineering, online 1st, 53–264 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jörg-Uwe Kietz
    • 1
  • Floarea Serban
    • 1
  • Simon Fischer
    • 2
  • Abraham Bernstein
    • 1
  1. 1.DDISUniversity of ZurichSwitzerland
  2. 2.Rapid-IDortmundGermany

Personalised recommendations