Preliminary Analysis of Data Sources Interlinking

Data Searchery: A Case Study
  • Andrea Mannocci
  • Paolo Manghi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 416)


The novel e-Science’s data-centric paradigm has proved that interlinking publications and research data objects coming from different realms and data sources (e.g. publication repositories, data repositories) makes dissemination, re-use, and validation of research activities more effective. Scholarly Communication Infrastructures (SCIs) are advocated for bridging such data sources by offering an overlay of services for identification, creation, and navigation of relationships among objects of different nature. Since realization and maintenance of such infrastructures is in general very cost-consuming, in this paper we propose a lightweight approach for “preliminary analysis of data source interlinking” to help practitioners at evaluating whether and to what extent realizing them can be effective. We present Data Searchery, a configurable tool delivering a service for relating objects across data sources, be them publications or research data, by identifying relationships between their metadata descriptions in real-time.


Interoperability Interlinking Research data Publications Metadata Inference 


  1. 1.
    Bourne, P.E., Clark, T.W., Dale, R., de Waard, A., Herman, I., Hovy, E.H., Shotton, D.: Improving the future of research communications and e-scholarship (Dagstuhl perspectives workshop 11331). Dagstuhl Manifestos 1(1), 41–60 (2012)Google Scholar
  2. 2.
  3. 3.
    Gray, J.: A transformed scientific method. In: Hey, T., Tansley, S., Tolle, K. (eds.) The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)Google Scholar
  4. 4.
    Reilly, S., Schallier, W., Schrimpf, S., Smit, E., Wilkinson, M.: Report on integration of data and publications. ODE Opportunities for Data ExchangeGoogle Scholar
  5. 5.
    Callaghan, S., Donegan, S.: Making data a first class scientific output: data citation and publication by NERC’s environmental data centres. Int. J. Digit. Curation 7(1), 107–113 (2012)CrossRefGoogle Scholar
  6. 6.
    Chavan, V., Penev, L.: The data paper: a mechanism to incentivize data publishing in biodiversity science. BMC Bioinform. 12(Suppl 15), S2 (2011)CrossRefGoogle Scholar
  7. 7.
    Hoogerwerf, M., Lösch, M., Schirrwagen, J., Callaghan, S., Manghi, P., Iatropoulou, K., Keramida, D., Rettberg, N.: Linking data and publications: towards a cross-disciplinary approach. Int. J. Digit. Curation 8(1), 244–254 (2013)CrossRefGoogle Scholar
  8. 8.
    Wallis, J.C., Rolando, E., Borgman, C.L.: If we share data, will anyone use them? data sharing and reuse in the long tail of science and technology. PLoS ONE 8(7), e67332 (2013)CrossRefGoogle Scholar
  9. 9.
    Castelli, D., Manghi, P., Thanos, C.: A vision towards scientific communication infrastructures - on bridging the realms of research digital libraries and scientific data centers. J. Digit. Libr. 13(3/4), 155–169 (2013)CrossRefGoogle Scholar
  10. 10.
    Manghi, P., Bolikowski, L., Manola, N., Shirrwagen, J., Smith, T.: Openaireplus: the European scholarly communication data infrastructure. D-Lib Mag. 18(9–10) (2012).
  11. 11.
    Manghi, P., Manola, N., Horstmann, W., Peters, D.: An infrastructure for managing EC funded research output - the openaire project. Grey J. (TGJ): Int. J. Grey Lit. 6(1), 31–40 (2010)Google Scholar
  12. 12.
    Attwood, T.K., Kell, D.B., McDermott, P., Marsh, J., Pettifer, S.R., Thorne, D.: Utopia documents: linking scholarly literature with research data. Bioinformatics 26(18), 568–574 (2010)CrossRefGoogle Scholar
  13. 13.
    Bruce, T.R., Hillmann, D.: The Continuum of Metadata Quality: Defining, Expressing, Exploiting. American Library Association, Chicago (2004)Google Scholar
  14. 14.
    Tani, A., Candela, L., Castelli, D.: Dealing with metadata quality: the legacy of digital library efforts. Inf. Process. Manag. 49(6), 1194–1205 (2013)CrossRefGoogle Scholar
  15. 15.
    Feijen, M., Horstmann, W., Manghi, P., Robinson, M., Russell, R.: DRIVER: Building the Network for Accessing Digital Repositories across Europe. In: Ariadne Magazine, vol. 53, pp. 1–4, Ariadne (2007).
  16. 16.
    Manghi, P., Mikulicic, M., Candela, L., Castelli, D., Pagano, P.: Realizing and maintaining aggregative digital library systems: D-net software toolkit and oaister system. D-Lib Mag. 16(3/4) (2010).
  17. 17.
    Berners-Lee, T.: Linked data.
  18. 18.
    Wölger, S., Siorpaes, K., Bürger, T., Simperl, E., Thaler, S., Hofer, C.: A survey on data interlinking methods. Technical report, Semantic Technology Institute (STI), University of Insbruck (March 2011)Google Scholar
  19. 19.
    Nikolaidou, P.T., Shaeles, S.N., Karakos, A.S.: MusicPedia: retrieving and merging-interlinking music metadata. Int. J. Comput. 3(8) (2011)Google Scholar
  20. 20.
    Rinke Hoekstra, P.G.: Linkitup: Link discovery for research data. In: Proceedings of the AAAI Fall Symposium on Discovery Informatics (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Consiglio Nazionale delle RicercheIstituto di Scienza e Tecnologie dell’Informazione “A. Faedo”PisaItaly

Personalised recommendations