Semi-automated Application Profile Generation for Research Data Assets

  • João Rocha da Silva
  • Cristina Ribeiro
  • João Correia Lopes
Part of the Communications in Computer and Information Science book series (CCIS, volume 343)


Selecting the right set of descriptors for the annotation of a specific dataset can be a hard problem in research data management. Considering a dataset in an arbitrary domain, an application profile is complex to build because of the abundance of metadata standards, ontologies and other descriptor sources available for different domains. We propose to partially automate the process of data description by generating application profile recommendations based on a research data asset knowledge base. Our approach builds on existing technologies for exploring linked data and results in a process which can be tightly coupled with the research workflow, giving researchers more control over the description of their data. Preliminary experiments show that we can build on state-of-the-art technologies for search indexes, graph databases and triple stores to explore existing sources of linked data for our profile generation.


Link Prediction Graph Database Metadata Standard Triple Store Metadata Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al-Khalifa, H.S., Davis, H.C.: The evolution of metadata from standards to semantics in E-learning applications. In: Proceedings of the Seventeenth Conference on Hypertext and Hypermedia - HYPERTEXT 2006, p. 69 (2006)Google Scholar
  2. 2.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web 7(3), 154–165 (2009)CrossRefGoogle Scholar
  3. 3.
    Brickley, D., Miller, L.: FOAF Vocabulary Specification 0.98 (2010)Google Scholar
  4. 4.
    Calais, E.: Gravity and the figure of the Earth (2012),
  5. 5.
    Dublin Core Metadata Initiative. DCMI Metadata Terms (2012),
  6. 6.
    Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link Prediction in Social Networks Using Computationally Efficient Topological Features. In: 2011 IEEE Third Int’l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int’l Conference on Social Computing, pp. 73–80 (October 2011)Google Scholar
  7. 7.
    Google Freebase. Freebase Documentation (2012),
  8. 8.
    Haase, K.: Context for semantic metadata. In: Proceedings of the 12th Annual ACM International, pp. 204–211 (2004)Google Scholar
  9. 9.
    Hasan, M.A., Chaoji, V., Salem, S.: Link prediction using supervised learning. In: SDM 2006: Workshop on Link (2006)Google Scholar
  10. 10.
    Huang, Z.: Link Prediction Based on Graph Topology: The Predictive Value of the Generalized Clustering Coefficient (2006)Google Scholar
  11. 11.
    Jones, S., Ross, S., Ruusalepp, R.: Data Audit Framework Methodology (2009)Google Scholar
  12. 12.
    Kleinberg, J.M.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM (JACM) 46(5), 604–632 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    LibenNowell, D.: The link prediction problem for social networks. In: CIKM 2003 Proceedings of the Twelfth International Conference on Information and Knowledge Management, pp. 556–559 (November 2004)Google Scholar
  14. 14.
    Lichtenwalter, R.N., Dame, N., Chawla, N.V.: Vertex Collocation Profiles: Subgraph Counting for Link Analysis and Prediction (1019), 1019–1028 (2012)Google Scholar
  15. 15.
    Lyon, L.: Dealing with Data: Roles, Rights, Responsibilities and Relationships. Technical report (2007)Google Scholar
  16. 16.
    Martinez-Uribe, L., Macdonald, S.: User Engagement in Research Data Curation. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 309–314. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    P. A. A. i. D. Media. Digital preservation strategies. Workbook on Digital Private Papers, pp. 222–246 (2008)Google Scholar
  18. 18.
  19. 19.
    Oracle ThinkQuest. Information Internet: Chemistry Gravimetry (2012),
  20. 20.
    Piwowar, H.A., Day, R.B., Fridsma, D.S.: Sharing detailed research data is associated with increased citation rate. PLoS One 2(3) (2007)Google Scholar
  21. 21.
    Treloar, A., Wilkinson, R.: Rethinking Metadata Creation and Management in a Data-Driven Research World. In: 2008 IEEE Fourth International Conference on eScience, pp. 782–789 (December 2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • João Rocha da Silva
    • 1
  • Cristina Ribeiro
    • 2
  • João Correia Lopes
    • 2
  1. 1.Faculdade de Engenharia da Universidade do Porto/INESC TECPortugal
  2. 2.DEIFaculdade de Engenharia da Universidade do Porto / INESC TECPortugal

Personalised recommendations