Skip to main content

Advertisement

Log in

A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this article, we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a knowledge area classification scheme. Our proposal relies on discriminatory evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://dl.acm.org.

  2. http://dblp.uni-trier.de.

  3. https://www.ncbi.nlm.nih.gov/pubmed/.

  4. http://www.ndltd.org.

  5. http://lattes.cnpq.br.

  6. http://www.cnpq.br/documents/10157/186158/TabeladeAreasdoConhecimento.pdf.

  7. In this article we use the terms classification and categorization interchangeably.

  8. Available at: http://www.lbd.dcc.ufmg.br/lbd/collections/hierarchical-categorization-of-research-expertise.

  9. http://scikit-learn.org.

  10. Literal translation from the original title in Portuguese:  A UNESCO e o Mundo da Cultura.

References

  1. Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)

  2. Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)

  3. Campos, R., Canuto, S., Salles, T., de Sá, C.C., Gonçalves, M.A.: Stacking bagged and boosted forests for effective automated classification. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 105–114. ACM (2017)

  4. Canuto, S., Gonçalves, M., Santos, W., Rosa, T., Martins, W.: An efficient and scalable metafeature-based document classification approach based on massively parallel computing. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 333–342. ACM (2015)

  5. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)

  6. Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)

  7. de Siqueira, G.O., Canuto, S., Gonçalves, M.A., Laender, A.H.F.: Automatic hierarchical categorization of research expertise using minimum information. In: International Conference on Theory and Practice of Digital Libraries, pp. 103–115. Springer (2017)

  8. Dias, T.M.R.: A study on the Brazilian scientific production based on data from the lattes platform (in Portuguese). Ph.D. Thesis, CEFET-MG, Belo Horizonte, MG (2016)

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer, New York (2001)

    Book  Google Scholar 

  10. Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)

    Article  Google Scholar 

  11. Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Syst. Appl. 38(7), 8586–8596 (2011)

    Article  Google Scholar 

  12. Liu, T.-Y., Yang, Y., Wan, H., Zeng, H.-J., Chen, Z., Ma, W.-Y.: Support vector machines classification with a very large-scale taxonomy. ACM SIGKDD Explor. Newsl. 7(1), 36–43 (2005)

    Article  Google Scholar 

  13. Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)

    Article  Google Scholar 

  14. Moreira, C., Calado, P., Martins, B.: Learning to Rank for Expert Search in Digital Libraries of Academic Publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Berlin (2011)

    Chapter  Google Scholar 

  15. Naik, A., Rangwala, H.: Hierflat: flattened hierarchies for improving top-down hierarchical classification. Int. J. Data Sci. Anal. 4(3), 191–208 (2017)

    Article  Google Scholar 

  16. Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spat. Algorithms Syst. 2(4), 14:1–14:24 (2016)

    Google Scholar 

  17. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)

    Article  Google Scholar 

  18. Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)

  19. Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. J. Assoc. Inf. Sci. Technol. 52(5), 391–401 (2001)

    Article  Google Scholar 

  20. Salles, T., Gonçalves, M., Rodrigues, V., Rocha, L.: Broof: exploiting out-of-bag errors, boosting and random forests for effective automated classification. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, pp. 353–362. ACM (2015)

  21. Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)

    Article  Google Scholar 

  22. Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12, 482 (2011)

    Article  Google Scholar 

  23. Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov. 22(1–2), 31–72 (2011)

    Article  MathSciNet  Google Scholar 

  24. Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic Theses and Dissertations: Data and Dissertations (2016)

  25. Viegas, F., da Rocha, L.C., Resende, E., Salles, T., Martins, W., Freitas, M.F., Gonçalves, M.A.: Exploiting efficient and effective lazy semi-bayesian strategies for text classification. Neurocomputing 307, 153–171 (2018)

    Article  Google Scholar 

  26. Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Berlin (2011)

    Chapter  Google Scholar 

  27. Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)

    Article  Google Scholar 

  28. Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially funded by project MASWeb (Grant FAPEMIG/PRONEX APQ-01400-14) and by the authors’ individual Grants from CAPES, CNPq and FAPEMIG.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Gustavo Oliveira de Siqueira or Alberto H. F. Laender.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

de Siqueira, G.O., Canuto, S., Gonçalves, M.A. et al. A pragmatic approach to hierarchical categorization of research expertise in the presence of scarce information. Int J Digit Libr 21, 61–73 (2020). https://doi.org/10.1007/s00799-018-0260-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-018-0260-z

Keywords

Navigation