Generating Possible Interpretations for Statistics from Linked Open Data

  • Heiko Paulheim
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7295)


Statistics are very present in our daily lives. Every day, new statistics are published, showing the perceived quality of living in different cities, the corruption index of different countries, and so on. Interpreting those statistics, on the other hand, is a difficult task. Often, statistics collect only very few attributes, and it is difficult to come up with hypotheses that explain, e.g., why the perceived quality of living in one city is higher than in another. In this paper, we introduce Explain-a-LOD, an approach which uses data from Linked Open Data for generating hypotheses that explain statistics. We show an implemented prototype and compare different approaches for generating hypotheses by analyzing the perceived quality of those hypotheses in a user study.


Association Rule User Study Rule Learning Entity Recognition Link Open Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)0Google Scholar
  2. 2.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems 5(3), 1–22 (2009)CrossRefGoogle Scholar
  3. 3.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Web Semantics - Science Services and Agents on the World Wide Web 7(3), 154–165 (2009)CrossRefGoogle Scholar
  4. 4.
    Bouckaert, R.R., Frank, E., Hall, M., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: WEKA — Experiences with a Java open-source project. Journal of Machine Learning Research 11, 2533–2541 (2010)Google Scholar
  5. 5.
    Callahan, E.S., Herring, S.C.: Cultural bias in wikipedia content on famous persons. Journal of the American Society for Information Science and Technology 62(10), 1899–1915 (2011)CrossRefGoogle Scholar
  6. 6.
    Cohen, W.W.: Fast effective rule induction. In: Twelfth International Conference on Machine Learning, pp. 115–123. Morgan Kaufmann (1995)Google Scholar
  7. 7.
    Ell, B., Vrandečić, D., Simperl, E.: Labels in the Web of Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 162–176. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  8. 8.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A. (eds.): Feature Extraction – Foundations and Applications. Springer (2006)Google Scholar
  9. 9.
    Ihaka, R.: R: Past and future history. In: Proceedings of the 30th Symposium on the Interface (1998)Google Scholar
  10. 10.
    Kiefer, C., Bernstein, A., Locher, A.: Adding Data Mining Support to SPARQL Via Statistical Relational Learning Methods. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., Koubarakis, M. (eds.) ESWC 2008. LNCS, vol. 5021, pp. 478–492. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  11. 11.
    Kämpgen, B., Harth, A.: Transforming statistical linked data for use in olap systems. In: 7th International Conference on Semantic Systems, I-SEMANTICS 2011 (2011)Google Scholar
  12. 12.
    Mulwad, V., Finin, T., Syed, Z., Joshi, A.: Using linked data to interpret tables. In: Proceedings of the First International Workshop on Consuming Linked Data, COLD 2010 (2010)Google Scholar
  13. 13.
    Novak, P.K., Vavpetič, A., Trajkovski, I., Lavrač, N.: Towards semantic data mining with g-segs. In: Proceedings of the 11th International Multiconference Information Society, IS 2009 (2009)Google Scholar
  14. 14.
    Ott, R.L., Longnecker, M.: Introduction to Statistical Methods and Data Analysis. Brooks/Cole (2006)Google Scholar
  15. 15.
    Paulheim, H., Fürnkranz, J.: Unsupervised Feature Generation from Linked Open Data. In: International Conference on Web Intelligence, Mining, and Semantics, WIMS 2012 (2012)Google Scholar
  16. 16.
    Piccinini, H., Casanova, M.A., Furtado, A.L., Nunes, B.P.: Verbalization of rdf triples with applications. In: ISWC 2011 – Outrageous Ideas track (2011)Google Scholar
  17. 17.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 697–706. ACM (2007)Google Scholar
  18. 18.
    W3C: SPARQL Query Language for RDF (2008),
  19. 19.
    Wrobel, S.: An algorithm for multi-relational discovery of subgroups. In: Symposium on Pattern Discovery in Databases (PKDD 1997) (1997)Google Scholar
  20. 20.
    Zapilko, B., Harth, A., Mathiak, B.: Enriching and analysing statistics with linked open data. In: Conference on New Techniques and Technologies for Statistics, NTTS (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Heiko Paulheim
    • 1
  1. 1.Knowledge Engineering GroupTechnische Universität DarmstadtGermany

Personalised recommendations