Advertisement

Tabular Data Cleaning and Linked Data Generation with Grafterizer

  • Dina Sukhobok
  • Nikolay Nikolov
  • Antoine Pultier
  • Xianglin Ye
  • Arne Berre
  • Rick Moynihan
  • Bill Roberts
  • Brian Elvesæter
  • Nivethika Mahasivam
  • Dumitru RomanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9989)

Abstract

Over the past several years the amount of published open data has increased significantly. The majority of this is tabular data, that requires powerful and flexible approaches for data cleaning and preparation in order to convert it into Linked Data. This paper introduces Grafterizer – a software framework developed to support data workers and data developers in the process of converting raw tabular data into linked data. Its main components include Grafter, a powerful software library and DSL for data cleaning and RDF-ization, and Grafterizer, a user interface for interactive specification of data transformations along with a back-end for management and execution of data transformations. The proposed demonstration will focus on Grafterizer’s powerful features for data cleaning and RDF-ization in a scenario using data about the risk of failure of transport infrastructure components due to natural hazards.

Keywords

Open data Linked data Tabular data cleaning and preparation Data transformation 

Notes

Acknowledgements

This work was partly funded by the European Commission within the following research projects: DaPaaS (FP7 610988), SmartOpenData (FP7 603824), InfraRisk (FP7 603960), and proDataMarket (H2020 644497).

References

  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data-the story so far. Emerg. Concepts, Semant. Serv. Interoperability Web Appl. 205–227 (2009)Google Scholar
  2. 2.
    Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Bull. Data Eng. 23, 4 (2000)Google Scholar
  3. 3.
    Wickham, H.: Tidy Data. J. Stat. Softw. 59(10), 1–23 (2011). Web. 1 Mar. 2016Google Scholar
  4. 4.
    Dasu, B.T., Johnson, T.: Exploratory Data Mining and Data Cleaning, 1st edn. Wiley, New York (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Skjæveland, M.G., Lian, E.H., Horrocks, I.: Publishing the Norwegian petroleum directorate’s factpages as semantic web data. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013, Part II. LNCS, vol. 8219, pp. 162–177. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Roman, D., Nikolov, N., Putlier, A., Sukhobok, D., Elvester, B., Berre, A., Ye, X., Dimitrov, M., Simov, A., Zarev, M., Moynihan, R., Roberts, B., Berlocher, I., Kim, S., Lee, T., Smith, A., Heath, T.: DataGraft: one-stop-shop for open data management. Semant. Web J. (SWJ) (2016, to appear). http://www.semantic-web-journal.net/system/files/swj1428.pdf

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Dina Sukhobok
    • 1
  • Nikolay Nikolov
    • 1
  • Antoine Pultier
    • 1
  • Xianglin Ye
    • 1
  • Arne Berre
    • 1
  • Rick Moynihan
    • 2
  • Bill Roberts
    • 2
  • Brian Elvesæter
    • 1
  • Nivethika Mahasivam
    • 1
  • Dumitru Roman
    • 1
    Email author
  1. 1.SINTEFOsloNorway
  2. 2.Swirrl IT LTD.StirlingshireUK

Personalised recommendations