Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques

  • Leonardo Candela
  • Gianpaolo Coro
  • Pasquale Pagano
Conference paper

DOI: 10.1007/978-3-642-35834-0_5

Volume 354 of the book series Communications in Computer and Information Science (CCIS)
Cite this paper as:
Candela L., Coro G., Pagano P. (2013) Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques. In: Agosti M., Esposito F., Ferilli S., Ferro N. (eds) Digital Libraries and Archives. IRCDL 2012. Communications in Computer and Information Science, vol 354. Springer, Berlin, Heidelberg

Abstract

Digital Libraries continue to evolve towards research environments supporting access and management of multiform Information Objects spread across multiple data sources and organizational domains. This evolution has introduced the need to deal with Information Objects having traits different from those characterizing Digital Libraries at their early stages and to revise the services supporting their management. Tabular data represent a class of Information Objects that require to be efficiently managed because of their core role in many eScience scenarios. This paper discusses the tabular data characterization problem, i.e., the problem of identifying the reference dataset of any column of the dataset. In particular, the paper presents an approach based on lexical matching techniques to support users during the data curation phase by providing them with a ranked list of reference datasets suitable for a dataset column.

Keywords

tabular data management data curation large-scale data infrastructure lexical similarity 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Leonardo Candela
    • 1
  • Gianpaolo Coro
    • 1
  • Pasquale Pagano
    • 1
  1. 1.Istituto di Scienza e Tecnologie dell’Informazione “Alessandro Faedo”Consiglio Nazionale delle RicerchePisaItaly