, Volume 118, Issue 1, pp 21–43 | Cite as

Identification of important citations by exploiting research articles’ metadata and cue-terms from content

  • Faiza QayyumEmail author
  • Muhammad Tanvir Afzal


Citations play a pivotal role in indicating various aspects of scientific literature. Quantitative citation analysis approaches have been used over the decades to measure the impact factor of journals, to rank researchers or institutions, to discover evolving research topics etc. Researchers doubted the pure quantitative citation analysis approaches and argued that all citations are not equally important; citation reasons must be considered while counting. In the recent past, researchers have focused on identifying important citation reasons by classifying them into important and non-important classes rather than individually classifying each reason. Most of contemporary citation classification techniques either rely on full content of articles, or they are dominated by content based features. However, most of the time content is not freely available as various journal publishers do not provide open access to articles. This paper presents a binary citation classification scheme, which is dominated by metadata based parameters. The study demonstrates the significance of metadata and content based parameters in varying scenarios. The experiments are performed on two annotated data sets, which are evaluated by employing SVM, KLR, Random Forest machine learning classifiers. The results are compared with the contemporary study that has performed similar classification employing rich list of content-based features. The results of comparisons revealed that the proposed model has attained improved value of precision (i.e., 0.68) just by relying on freely available metadata. We claim that the proposed approach can serve as the best alternative in the scenarios wherein content in unavailable.


Citation classification Metadata Information retrieval Support vector machine Kernel logistic regression Random forest 


  1. Abu-Jbara, A., & Radev, D. (2011).Coherent citation-based summarization of scientific papers. In Proceedings of the 49th annual meeting of the association for computational linguistics (Vol. 1, pp. 500–509). Stroudsburg, PA: Association for Computational Linguistics.Google Scholar
  2. Anderson, R., Narin, F., & McAllister, P. (1978). Publication ratings versus peer ratings of universities. Journal of the American Society for Information Science, 29(2), 91–103.CrossRefGoogle Scholar
  3. Ayaz, S., & Afzal, M. T. (2016). Identification of conversion factor for completing-h index for the field of mathematics. Scientometrics, 109(3), 1511–1524.CrossRefGoogle Scholar
  4. Benedictus, R., Miedema, F., & Ferguson, M. (2016). Fewer numbers, better science. Nature, 538(7626), 453–455.CrossRefGoogle Scholar
  5. Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.CrossRefGoogle Scholar
  6. Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.CrossRefGoogle Scholar
  7. Brooks, T. (1985). Private acts and public objects: An investigation of citer motivations. Journal of the American Society for Information Science, 6(4), 223–229.CrossRefGoogle Scholar
  8. Case, D. O., & Higgins, G. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.CrossRefGoogle Scholar
  9. Diederich, J., & Balke, W. T. (2007). The semantic growbag algorithm: Automatically deriving categorization systems. In International conference on theory and practice of digital libraries (pp. 1–13). Berlin: Springer.Google Scholar
  10. Ellis, D. (1993). Modeling the information-seeking patterns of academic researchers: A grounded theory approach. The Library Quarterly, 63(4), 469–486.CrossRefGoogle Scholar
  11. Finney, B. (1979). The reference characteristics of scientific texts. Master’s thesis. London: The City University of London.Google Scholar
  12. Garfield, E. (1965). Can citation indexing be automated. In Statistical association methods for mechanized documentation, symposium proceedings (Vol. 269, pp. 189–192). Washington, DC: National Bureau of Standards, Miscellaneous Publication 269.Google Scholar
  13. Garzone, M., & Mercer, R. (2000).Towards an automated citation classifier. In Conference of the canadian society for computational studies of intelligence (pp. 346–337). Berlin: Springer.Google Scholar
  14. Giles, L. C., Bollacker, K., & Lawrence, S. (1998). CiteSeer: An automatic citation indexing system. In Proceedings of the third ACM conference on Digital libraries (pp. 88–98). ACM.Google Scholar
  15. Hirsch, Jorge E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.CrossRefzbMATHGoogle Scholar
  16. Inhaber, H., & Przednowek, K. (1976). Quality of research and the Nobel prizes. Social Studies of Science, 6(1), 33–50.CrossRefGoogle Scholar
  17. Jeong, Y., Song, M., & Ding, Y. (2014). Content-based Author co-citation analysis. Journal of Informetrics, 8(1), 197–211.CrossRefGoogle Scholar
  18. Jochim, C., & Schütze, H. (2012). Towards a generic and flexible citation classifier based on a faceted classification scheme. In Proceedings of COLING’12 (pp. 1343–1358). Mumbai, India: COLING’12.Google Scholar
  19. Krikelas, J. (1983). Information-seeking behavior: Patterns and concepts. Drexel Library Quarterly, 19(2), 5–20.Google Scholar
  20. Lawrence, S., Giles, C. L., & Bollacker, K. D. (1999). Digital libraries and autonomous citation indexing. Computer, 32(6), 67–71.CrossRefGoogle Scholar
  21. Li, X., He, Y., Meyers, A., & Grishman, R. (2013). Towards fine-grained citation function classification. In Proceedings of recent advances in natural language processing (pp. 402–407). Hissar, Bulgaria.Google Scholar
  22. MacRoberts, M. H., & MacRoberts, B. R. (2018). The mismeasure of science: Citation analysis. Journal of the Association for Information Science and Technology, 69(3), 474–482.CrossRefGoogle Scholar
  23. Mai, J. E. (2016). Looking for information: A survey of research on information seeking, needs, and behavior. Bingley: Emerald Group Publishing.Google Scholar
  24. Mazloumian, A., Helbing, D., Lozano, S., Light, R. P., & Börner, K. (2013). Global multi-level analysis of the ‘Scientific Food Web’. Scientific, reports, 3.Google Scholar
  25. Mehmood, Q., Qadir, M., & Afzal, M. (2014). Finding relatedness between research papers using similarity and dissimilarity scores. In 15th international conference Web-Age information Management (pp. 707–710). Macau, China.Google Scholar
  26. Meyers, A. (2013). Contrasting and corroborating citations in journal articles. In Proceedings of the international conference recent advances in natural language processing RANLP (pp. 460–466). Hissar, Bulgaria: RANLP.Google Scholar
  27. Moravcsik, J. M., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5(1), 88–91.CrossRefGoogle Scholar
  28. Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: Computer Horizons.Google Scholar
  29. Oppenheim, C., & Renn, S. P. (1978). Cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information, 29(5), 227–231.Google Scholar
  30. Peroni, S., & Shotton, D. (2012). FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17, 33–43.CrossRefGoogle Scholar
  31. Pham, S., & Hoffmann, A. (2003). A new approach for scientific citation classification using cue phrases. In L. C. C. F. Tam´as Domonkos Gedeon (Ed.), AI 2003: Advances in artificial intelligence (Vol. 2903, pp. 759–771)., Lecture notes in computer science Berlin: Springer.CrossRefGoogle Scholar
  32. Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.CrossRefGoogle Scholar
  33. Raheel, M., Ayaz, S., & Afzal, M. T. (2018). Evaluation of h-index, its variants and extensions based on publication age & citation intensity in civil engineering. Scientometrics, 114(3), 1107–1127.CrossRefGoogle Scholar
  34. Shahid, A., Afzal, M. T., & Qadir, M. A. (2011). Discovering semantic relatedness between scientific articles through citation. Australian Journal of Basic and Applied Sciences, 5(6), 1599–1604.Google Scholar
  35. Smith, A. T., & Eysenck, M. (2002). The correlation between RAE ratings and citation counts in psychology. London: University of Royal Holloway.Google Scholar
  36. Spiegel-Rusing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.CrossRefGoogle Scholar
  37. Teufel, S., Siddharthan, A., & Tidhar, D. (2006). Automatic classification of citation function. In Proceedings of the 2006 conference on empirical methods in natural language processing (pp. 103–110). Association for Computational Linguistics.Google Scholar
  38. Valenzuela, M., Ha, V., & Etzioni, O. (2015). Identifying meaningful citations. Workshops at the twenty-ninth AAAI conference on artificial intelligence. AAAIGoogle Scholar
  39. Wilsdon, J., Allen, L., Belfiore, E., Campbell, P., Curry, S. H., Jones, R., et al. (2015). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Publisher Full Text.Google Scholar
  40. Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.CrossRefGoogle Scholar
  41. Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science (Vol. 519). Cambridge: CUP Archive.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceCapital University of Science and TechnologyIslamabadPakistan

Personalised recommendations