DSCrank: A Method for Selection and Ranking of Datasets

  • Yasmmin Cortes MartinsEmail author
  • Fábio Faria da Mota
  • Maria Cláudia Cavalcanti
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 672)


Considerable efforts have been made to build the Web of Data. One of the main challenges has to do with how to identify the most related datasets to connect to. Another challenge is to publish a local dataset into the Web of Data, following the Linked Data principles. The present work is based on the idea that a set of activities should guide the user on the publication of a new dataset into the Web of Data. It presents the specification and implementation of two initial activities, which correspond to the crawling and ranking of a selected set of existing published datasets. The proposed implementation is based on the focused crawling approach, adapting it to address the Linked Data principles. Moreover, the dataset ranking is based on a quick glimpse into the content of the selected datasets. Additionally, the paper presents a case study in the Biomedical area to validate the implemented approach, and it shows promising results with respect to scalability and performance.


Resource Description Framework SPARQL Query Publishing Dataset External Link Relevance Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was partially funded by CAPES scholarship, CNPq (proc. 307647/2012-9) and FAPERJ (Proc.E-26/111.147/2011).


  1. 1.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked data - the story so far. Int. J. Semant. Web Inf. Syst. 5(3), 1–22 (2009)CrossRefGoogle Scholar
  2. 2.
    Caliskan, K., Ozcan, R.: Comparing classification methods for link context based focused crawlers. In: 2013 International Conference on Electronics, Computer and Computation (ICECCO), pp. 143–146, November 2013Google Scholar
  3. 3.
    Hausenblas, M.: Exploiting linked data to build web applications. IEEE Internet Comput. 13(4), 68–73 (2009). Accessed 01 May 2016MathSciNetCrossRefGoogle Scholar
  4. 4.
    Hull, D.A.: Stemming algorithms: a case study for detailed evaluation. J. Am. Soc. Inf. Sci. (JASIS) 47(1), 70–84 (1996)CrossRefGoogle Scholar
  5. 5.
    Leme, L.A.P.P., Lopes, G.R., Nunes, B.P., Casanova, M.A., Dietze, S.: Identifying candidate datasets for data interlinking. In: Daniel, F., Dolog, P., Li, Q. (eds.) ICWE 2013. LNCS, vol. 7977, pp. 354–366. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39200-9_29 CrossRefGoogle Scholar
  6. 6.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefzbMATHGoogle Scholar
  7. 7.
    Nikolov, A., d’Aquin, M., Motta, E.: What should i link to? identifying relevant sources and classes for data linking. In: Pan, J.Z., Chen, H., Kim, H.-G., Li, J., Horrocks, I., Mizoguchi, R., Wu, Z., Wu, Z. (eds.) JIST 2011. LNCS, vol. 7185, pp. 284–299. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-29923-0_19 CrossRefGoogle Scholar
  8. 8.
    de Oliveira, H.R., Tavares, A.T., Lóscio, B.F.: Feedback-based data set recommendation for building linked data applications. In: International Conference on Semantic Systems, I-SEMANTICS 2012, pp. 49–55. ACM, New York (2012)Google Scholar
  9. 9.
    Raman, S., Chaurasiya, V., Venkatesan, S.: Performance comparison of various information retrieval models used in search engines. In: International Conference on Communication, Information Computing Technology (ICCICT), pp. 1–4 (2012)Google Scholar
  10. 10.
    Salvadores, M., Alexander, P.R., Musen, M.A., Noy, N.F.: Bioportal as a dataset of linked biomedical ontologies and terminologies in RDF. Semant. Web 4(3), 277–284 (2013)Google Scholar
  11. 11.
    Studer, R., Benjamins, V.R., Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25(1–2), 161–197 (1998)CrossRefzbMATHGoogle Scholar
  12. 12.
    Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)Google Scholar
  13. 13.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Yasmmin Cortes Martins
    • 1
    • 3
    Email author
  • Fábio Faria da Mota
    • 2
  • Maria Cláudia Cavalcanti
    • 1
  1. 1.Military Institute of EngineeringRio de JaneiroBrazil
  2. 2.IOC/FIOCRUZRio de JaneiroBrazil
  3. 3.National Laboratory of Scientific ComputingPetrópolisBrazil

Personalised recommendations