Abstract
Access to high quality and recent data is crucial both for decision makers in cities as well as for the public. Likewise, infrastructure providers could offer more tailored solutions to cities based on such data. However, even though there are many data sets containing relevant indicators about cities available as open data, it is cumbersome to integrate and analyze them, since the collection is still a manual process and the sources are not connected to each other upfront. Further, disjoint indicators and cities across the available data sources lead to a large proportion of missing values when integrating these sources. In this paper we present a platform for collecting, integrating, and enriching open data about cities in a reusable and comparable manner: we have integrated various open data sources and present approaches for predicting missing values, where we use standard regression methods in combination with principal component analysis (PCA) to improve quality and amount of predicted values. Since indicators and cities only have partial overlaps across data sets, we particularly focus on predicting indicator values across data sets, where we extend, adapt, and evaluate our prediction model for this particular purpose: as a “side product” we learn ontology mappings (simple equations and sub-properties) for pairs of indicators from different data sets. Finally, we republish the integrated and predicted values as linked open data.
Compared to an informal, preliminary version of this paper presented at the Know@LOD 2015 workshop, Section 5, 6, and 8 are entirely new, plus more data sources have been integrated.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bettencourt, L.M.A., Lobo, J., Helbing, D., Kühnert, C., West, G.B.: Growth, innovation, scaling, and the pace of life in cities. Proc. of the National Academy of Sciences of the United States of America 104(17), 7301–7306 (2007)
Bischof, S., Polleres, A.: RDFS with attribute equations via SPARQL rewriting. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 335–350. Springer, Heidelberg (2013)
Bischof, S., Polleres, A., Sperl, S.: City data pipeline. In: Proc. of the I-SEMANTICS 2013 Posters & Demonstrations Track, pp. 45–49 (2013)
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the web of data. J. Web. Sem. 7(3), 154–165 (2009)
Brickley, D., Guha, R., (eds.): RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation, W3C (2004)
Economist Intelligence Unit (ed.): The Green City Index. Siemens AG (2012)
Euzenat, J., Shvaiko, P.: Ontology matching, 2nd edn. Springer (2013)
Gil, Y., Miles, S.: PROV Model Primer. W3C Note, W3C (2013)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Han, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc. (2012)
Kämpgen, B., O’Riain, S., Harth, A.: Interacting with statistical linked data via OLAP operations. In: Simperl, E., Norton, B., Mladenic, D., Valle, E.D., Fundulaki, I., Passant, A., Troncy, R. (eds.) ESWC 2012. LNCS, vol. 7540, pp. 87–101. Springer, Heidelberg (2015)
Keet, C.M., Ławrynowicz, A., d’Amato, C., Kalousis, A., Nguyen, P., Palma, R., Stevens, R., Hilario, M.: The data mining OPtimization ontology. Web Semantics: Science, Services and Agents on the World Wide Web 32, 43–53 (2015)
Lopez, V., Kotoulas, S., Sbodio, M.L., Stephenson, M., Gkoulalas-Divanis, A., Aonghusa, P.M.: QuerioCity: a linked data platform for urban information management. In: Cudré-Mauroux, P., Heflin, J., Sirin, E., Tudorache, T., Euzenat, J., Hauswirth, M., Parreira, J.X., Hendler, J., Schreiber, G., Bernstein, A., Blomqvist, E. (eds.) ISWC 2012, Part II. LNCS, vol. 7650, pp. 148–163. Springer, Heidelberg (2012)
Nickel, M., Tresp, V., Kriegel, H.: Factorizing YAGO: scalable machine learning for linked data. In: Proc. of WWW 2012, pp. 271–280 (2012)
Office for Official Publications of the European Communities: Urban Audit. Methodological Handbook (2004)
Paulheim, H.: Generating possible interpretations for statistics from linked open data. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 560–574. Springer, Heidelberg (2012)
Paulheim, H., Fürnkranz, J.: Unsupervised generation of data mining features from linked open data. In: Proc. of WIMS 2012, p. 31. ACM (2012)
Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C.: Data mining with background knowledge from the web. In: Proc. of the 5th RapidMiner World (2014)
R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing (2009)
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Roweis, S.T.: EM algorithms for PCA and SPCA. In: Advances in Neural Information Processing Systems, (NIPS 1997), vol. 10, pp. 626–632 (1997)
Sanchez, V.: Advanced support vector machines and kernel methods. Neurocomputing 55(1–2), 5–20 (2003)
Stadler, C., Lehmann, J., Höffner, K., Auer, S.: LinkedGeoData: A core for a web of spatial open data. Semantic Web 3(4), 333–354 (2012)
Statistics, L.B., Breiman, L.: Random forests. In: Machine Learning, pp. 5–32 (2001)
Thomsen, C., Pedersen, T.B.: A survey of open source tools for business intelligence. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 74–84. Springer, Heidelberg (2005)
U.S. Census Bureau: County and City Data Book 2007 (2007). https://www.census.gov/compendia/databooks/
Venables, W.N., Ripley, B.D.: Modern Applied Statistics with S., 4th edn. Springer (2002)
West, M., Harrison, P.J., Migon, H.S.: Dynamic generalized linear models and bayesian forecasting. Journal of the American Statistical Association 80(389), 73–83 (1985)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers Inc. (2011)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bischof, S., Martin, C., Polleres, A., Schneider, P. (2015). Collecting, Integrating, Enriching and Republishing Open City Data as Linked Data. In: Arenas, M., et al. The Semantic Web - ISWC 2015. ISWC 2015. Lecture Notes in Computer Science(), vol 9367. Springer, Cham. https://doi.org/10.1007/978-3-319-25010-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-25010-6_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25009-0
Online ISBN: 978-3-319-25010-6
eBook Packages: Computer ScienceComputer Science (R0)