Automatic Hierarchical Categorization of Research Expertise Using Minimum Information

  • Gustavo Oliveira de Siqueira
  • Sérgio Canuto
  • Marcos André Gonçalves
  • Alberto H. F. LaenderEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10450)


Throughout the history of science, different knowledge areas have collaborated to overcome major research challenges. The task of associating a researcher with such areas makes a series of tasks feasible such as the organization of digital repositories, expertise recommendation and the formation of research groups for complex problems. In this paper we propose a simple yet effective automatic classification model that is capable of categorizing research expertise according to a hierarchical knowledge area classification scheme. Our proposal relies on discriminative evidence provided by the title of academic works, which is the minimum information capable of relating a researcher to its knowledge area. We also evaluate the use of learning-to-rank as an effective mean to rank experts with minimum information. Our experiments show that using supervised machine learning methods trained with manually labeled information, it is possible to produce effective classification and ranking models.


Research expertise categorization Classification schemes Supervised classification Learning-to-rank 



This work was partially funded by projects InWeb (grant MCT/CNPq 573871/2008-6) and MASWeb (grant FAPEMIG/PRONEX APQ-01400-14), and by the authors’ individual grants from CAPES, CNPq and FAPEMIG.


  1. 1.
    Aletras, N., Baldwin, T., Lau, J.H., Stevenson, M.: Representing topics labels for exploring digital libraries. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 239–248 (2014)Google Scholar
  2. 2.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co. Inc., Boston (1999)Google Scholar
  3. 3.
    Bakalov, A., McCallum, A., Wallach, H., Mimno, D.: Topic models for taxonomies. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 237–240 (2012)Google Scholar
  4. 4.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefzbMATHGoogle Scholar
  5. 5.
    Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, vol. 3, pp. 1776–1781 (2011)Google Scholar
  6. 6.
    Chen, Y., Fox, E.A.: Using ACM DL paper metadata as an auxiliary source for building educational collections. In: Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 137–140 (2014)Google Scholar
  7. 7.
    de Sá, C.C., Gonçalves, M.A., Sousa, D.X., Salles, T.: Generalized BROOF-L2R: a general framework for learning to rank based on boosting and random forests. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 95–104 (2016)Google Scholar
  8. 8.
    Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics. Springer New York Inc., New York (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  11. 11.
    Lane, J.: Let’s make science metrics more scientific. Nature 464(7288), 488–489 (2010)CrossRefGoogle Scholar
  12. 12.
    Li, M., Liu, L., Li, C.-B.: An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Exp. Syst. Appl. 38(7), 8586–8596 (2011)CrossRefGoogle Scholar
  13. 13.
    Macdonald, C., Ounis, I.: Voting techniques for expert search. Knowl. Inf. Syst. 16(3), 259–280 (2008)CrossRefGoogle Scholar
  14. 14.
    Moreira, C., Calado, P., Martins, B.: Learning to rank for expert search in digital libraries of Academic publications. In: Antunes, L., Pinto, H.S. (eds.) Progress in Artificial Intelligence, pp. 431–445. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Niu, W., Liu, Z., Caverlee, J.: On local expert discovery via geo-located crowds, queries, and candidates. ACM Trans. Spatial Algorithms Syst. 2(4), 14:1–14:24 (2016)CrossRefGoogle Scholar
  16. 16.
    Qin, T., Liu, T.-Y., Xu, J., Li, H.: Letor: a benchmark collection for research on learning to rank for information retrieval. Inf. Retr. 13(4), 346–374 (2010)CrossRefGoogle Scholar
  17. 17.
    Ribeiro, I.S., Santos, R.L.T., Gonçalves, M.A., Laender, A.H.F.: On tag recommendation for expertise profiling: a case study in the scientific domain. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 189–198 (2015)Google Scholar
  18. 18.
    Ribeiro-Neto, B.A., Laender, A.H.F., de Lima, L.R.S.: An experimental study in automatically categorizing medical documents. JASIST 52(5), 391–401 (2001)CrossRefGoogle Scholar
  19. 19.
    Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. Int’l. J. Metadata Semant. Ontol. 2(2), 112–122 (2007)CrossRefGoogle Scholar
  20. 20.
    Seymour, E., Damle, R., Sette, A., Peters, B.: Cost sensitive hierarchical document classification to triage PubMed abstracts for manual curation. BMC Bioinform. 12(1), 482 (2011)CrossRefGoogle Scholar
  21. 21.
    Silla Jr., C.N., Freitas, A.A.: A survey of hierarchical classification across different application domains. Data Mining Knowl. Disc. 22(1–2), 31–72 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Srinivasan, V., Fox, E.: Progress towards automated ETD cataloging. In: Proceedings of the 19th International Symposium on Electronic theses, dissertations: Data and dissertations (2016)Google Scholar
  23. 23.
    Waltinger, U., Mehler, A., Lösch, M., Horstmann, W.: Hierarchical classification of OAI metadata using the DDC taxonomy. In: Bernardi, R., Anderson, S., Bjrn, C., Frdrique, G., Zaihrayeu, S. (eds.) Advanced Language Technologies for Digital Libraries, pp. 29–40. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Yang, K.-W., Huh, S.-Y.: Automatic expert identification using a text categorization technique in knowledge management systems. Expert Syst. Appl. 34(2), 1445–1455 (2008)CrossRefGoogle Scholar
  25. 25.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Inf. Retr. J. 1(1–2), 69–90 (1999)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 88(1), 47–68 (2012)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Gustavo Oliveira de Siqueira
    • 1
  • Sérgio Canuto
    • 1
  • Marcos André Gonçalves
    • 1
  • Alberto H. F. Laender
    • 1
    Email author
  1. 1.Department of Computer ScienceUniversidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations