Supporting Description of Research Data: Evaluation and Comparison of Term and Concept Extraction Approaches
The importance of research data management is widely recognized. Dendro is an ontology-based platform that allows researchers to describe datasets using generic and domain-specific descriptors from ontologies. Selecting or building the right ontologies for each research domain or group requires meetings between curators and researchers in order to capture the main concepts of their research. Envisioning a tool to assist curators through the automatic extraction of key concepts from research documents, we propose 2 concept extraction methods and compare them with a term extraction method. To compare the three approaches, we use as ground truth an ontology previously created by human curators.
KeywordsTerm extraction Ontology learning Research data management
- 1.Amorim, R.C., Castro, J.A., da Silva, J.R., Ribeiro, C.: A comparative study of platforms for research data management: interoperability, metadata capabilities and integration potential. In: Rocha, A., Correia, A.M., Costanzo, S., Reis, L.P. (eds.) New Contributions in Information Systems and Technologies. AISC, vol. 353, pp. 101–111. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16486-1_10CrossRefGoogle Scholar
- 2.Castro, J.A., Perrotta, D., Amorim, R.C., da Silva, J.R., Ribeiro, C.: Ontologies for research data description: a design process applied to vehicle simulation. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 348–354. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24129-6_30CrossRefGoogle Scholar
- 5.Rocha, J., Ribeiro, C., Lopes, J.: Ranking Dublin Core descriptor lists from user interactions: a case study with Dublin Core Terms using the Dendro platform. Int. J. Digital Libr. (2018). https://doi.org/10.1007/s00799-018-0238-x