Advertisement

The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas

  • Angelo A. Salatino
  • Thiviyan Thanapalasingam
  • Andrea Mannocci
  • Francesco Osborne
  • Enrico Motta
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11137)

Abstract

Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 26K topics and 226K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: (i) it includes a very large number of topics that do not appear in other classifications, and (ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO we have developed the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO at different levels. Users can use the portal to rate topics and relationships, suggest missing relationships, and visualise sections of the ontology. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various communities engaged with scholarly data.

Keywords

Scholarly data Ontology learning Bibliographic data Scholarly ontologies 

References

  1. 1.
    Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of twitter. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 508–524. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-35176-1_32CrossRefGoogle Scholar
  2. 2.
    Ding, L., Kolari, P., Ding, Z., Avancha, S.: Using ontologies in the semantic web: a survey. In: Sharman, R., Kishore, R., Ramesh, R. (eds.) Ontologies: A Handbook of Principles, Concepts and Applications in Information Systems, pp. 79–113. Springer, Boston (2007).  https://doi.org/10.1007/978-0-387-37022-4_4CrossRefGoogle Scholar
  3. 3.
    Osborne, F., Salatino, A., Birukou, A., Motta, E.: Automatic classification of Springer nature proceedings with smart topic miner. In: Groth, P., et al. (eds.) ISWC 2016, Part II. LNCS, vol. 9982, pp. 383–399. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46547-0_33CrossRefGoogle Scholar
  4. 4.
    Middleton, S.E., Roure, D.D., Shadbolt, N.R.: Ontology-based recommender systems. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. IHIS, pp. 779–796. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-540-92673-3_35CrossRefGoogle Scholar
  5. 5.
    Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining, pp. 541–544. IEEE Computer Society (2003)Google Scholar
  6. 6.
    Livingston, K.M., Bada, M., Baumgartner, W.A., Hunter, L.E.: KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinform. 16, 126 (2015)CrossRefGoogle Scholar
  7. 7.
    Osborne, F., Motta, E., Mulholland, P.: Exploring scholarly data with rexplore. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 460–477. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_29CrossRefGoogle Scholar
  8. 8.
    Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a knowledge graph representing research findings by semantifying survey articles. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 315–327. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67008-9_25CrossRefGoogle Scholar
  9. 9.
    Bettencourt, L.M.A., Kaiser, D.I., Kaur, J.: Scientific discovery and topological transitions in collaboration networks. J. Informetr. 3, 210–221 (2009)CrossRefGoogle Scholar
  10. 10.
    Osborne, F., Scavo, G., Motta, E.: Identifying diachronic topic-based research communities by clustering shared research trajectories. In: Presutti, V., d’Amato, C., Gandon, F., d’Aquin, M., Staab, S., Tordai, A. (eds.) ESWC 2014. LNCS, vol. 8465, pp. 114–129. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-07443-6_9CrossRefGoogle Scholar
  11. 11.
    Salatino, A.A., Osborne, F., Motta, E.: AUGUR: forecasting the emergence of new research topics. In: Joint Conference on Digital Libraries 2018, Fort Worth, Texas, pp. 1–10 (2018)Google Scholar
  12. 12.
    Osborne, F., Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 408–424. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-25007-6_24CrossRefGoogle Scholar
  13. 13.
    Osborne, F., Muccini, H., Lago, P., Motta, E.: Reducing the Effort for Systematic Reviews in Software Engineering Pre-Print: https://bit.ly/2sobCkI
  14. 14.
    Thanapalasingam, T., Osborne, F., Birukou, A., Motta, E.: Ontology-based recommendation of editorial products. In: International Semantic Web Conference 2018, Monterey, CA, USA (2018)Google Scholar
  15. 15.
    Lipscomb, C.E.: Medical subject headings (MeSH). Bull. Med. Libr. Assoc. 88, 265–266 (2000)Google Scholar
  16. 16.
    Cherrier, B.: Classifying economics: a history of the JEL codes. J. Econ. Lit. 55, 545–579 (2017)CrossRefGoogle Scholar
  17. 17.
    Clough, P., Sanderson, M., Gollins, T.: Examining the limits of crowdsourcing for relevance assessment. IEEE Internet Comput. 17, 32–38 (2013)CrossRefGoogle Scholar
  18. 18.
    Cimiano, P., Völker, J.: Text2Onto. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 227–238. Springer, Heidelberg (2005).  https://doi.org/10.1007/11428817_21CrossRefGoogle Scholar
  19. 19.
    Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen framework: automating the generation of a taxonomy for a large document collection. In: Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences, HICSS-32. Abstracts and CD-ROM of Full Papers, p. 9. IEEE Computer Society (1999)Google Scholar
  20. 20.
    Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 1999, pp. 206–213. ACM Press, New York (1999)Google Scholar
  21. 21.
    Wohlgenannt, G., Weichselbraun, A., Scharl, A., Sabou, M.: Dynamic integration of multiple evidence sources for ontology learning. J. Inf. Data Manag. 3, 243–254 (2012)Google Scholar
  22. 22.
    Mortensen, J.M., Musen, M.A., Noy, N.F.: Crowdsourcing the verification of relationships in biomedical ontologies. In: AMIA Annual Symposium Proceedings 2013, pp. 1020–1029 (2013)Google Scholar
  23. 23.
    Kirrane, S., et al.: A decade of semantic web research through the lenses of a mixed methods approach. Semant. Web J. - Prepr. (2018)Google Scholar
  24. 24.
    Osborne, F., Mannocci, A., Motta, E.: Forecasting the spreading of technologies in research communities. In: Proceedings of the Knowledge Capture Conference (2017)Google Scholar
  25. 25.
    Cano-Basave, A.E., Osborne, F., Salatino, A.A.: Ontology forecasting in scientific literature: semantic concepts prediction based on innovation-adoption priors. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds.) EKAW 2016. LNCS (LNAI), vol. 10024, pp. 51–67. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49004-5_4CrossRefGoogle Scholar
  26. 26.
    Blei, D.M., Edu, B.B., Ng, A.Y., Edu, A.S., Jordan, M.I., Edu, J.B.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)Google Scholar
  27. 27.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS 2013, p. 121. ACM Press, New York (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Angelo A. Salatino
    • 1
  • Thiviyan Thanapalasingam
    • 1
  • Andrea Mannocci
    • 1
  • Francesco Osborne
    • 1
  • Enrico Motta
    • 1
  1. 1.Knowledge Media InstituteThe Open UniversityMilton KeynesUK

Personalised recommendations