Skip to main content
Log in

A content-based citation analysis study based on text categorization

  • Published:
Scientometrics Aims and scope Submit manuscript


Publications and citations are important components for measuring research performance. Academics receive incentives, tenures, or awards from the number of citations they receive; however, the use of citations for research/er evaluation purposes can give rise to unethical practices and manipulation. Consequently, it is necessary to change the current approach to the use of citations. The main aim of this study was to conduct a content-based citation analysis study for Turkish citations. To achieve this aim, 423 peer-reviewed articles, the associated 12,881 references, and 101,019 sentences published in library and information science literature in Turkey were thoroughly examined. The citations were divided into four main categories; citation meaning, citation purpose, citation shape, and citation array. Then, each category was further divided into sub-categories. A tagging process with inter-annotator agreement was conducted and citation categories for the citation sentences determined. Weka software was used to apply the text categorization methods. The automatic citation sentence classification achieved at least a 90% success rate for all citation classes, which proved that using computational linguistics to evaluate citation contexts developing new techniques was possible and gave more detailed results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others


  1. Websites such as Essential Science Indicators (, Highly Cited Researchers ( and ScienceWatch ( present rankings of authors, institutions and countries by using number of publications and citations.

  2. Numbers of citations are important indicators for tenures and incentives in Turkey. For example, authors who have received high citation rate for their publications are supported by Scientific Research Projects Coordination Unit of Hacettepe University to travel abroad for international conferences (Hacettepe Üniversitesi… 2015). In addition, numbers of citations to publications are important for tenures and academic promotions (Öğretim Üyeliğine Yükseltilme… 1982). There is a separate section for citations in “Academic Incentive Payment” given to academic staff working at state universities. Each citation is graded by using different evaluation elements such as position, number of authors, citations’ origin etc. (Akademik 2016).

  3. Evaluations based on the tags made by data entry operator 1.


  • Akademik Teşvik Ödeneği Yönetmeliği [Academic Incentive Regulation], T.C. Resmi Gazete [Official Gazette]. (13271644, 27.12.2016).

  • Al, U., & Soydal, İ. (2012). Dergi kendine atıfının etkisi: Energy Education Science and Technology örneği [The impact of journal self-citation: The case of Energy Education Science and Technology]. Türk Kütüphaneciliği [Turkish Librarianship], 26(4), 699–714.

    Google Scholar 

  • Al, U., & Soydal, İ. (2014). Akademinin atıf dizinleri ile savaşı [The war of academia with citation indexes]. Hacettepe Üniversitesi Edebiyat Fakültesi Dergisi [Hacettepe University Journal of Faculty of Letters], 31(1), 23–42.

    Google Scholar 

  • Al, U., & Soydal, İ. (2015). Bilimsel iletişimin farklı bir yüzü: Geri çekilen makaleler [The other face of scholarly communication: Retracted articles]. In U. Al & Z. Taşkın (Eds.), Prof. Dr. İrfan Çakın’a Armağan (pp. 22–37). Ankara: Hacettepe University.

    Google Scholar 

  • Aljaber, B., Stokes, N., Bailey, J., & Pei, J. (2010). Document clustering of scientific texts using citation contexts. Information Retrieval, 13(2), 101–131.

    Article  Google Scholar 

  • Angrosh, M.A., Cranefield, S., & Stanger, N. (2010). Context identification of sentences in related work sections using a conditional random field: Towards intelligent digital libraries. In Proceedings of the ACM, JCDL’10 (pp. 293–302). Queensland: ACM.

  • Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596.

    Article  Google Scholar 

  • Arunachalam, S., & Manorama, K. (1988). Are citation-based quantitative techniques adequate for measuring science on the periphery? Scientometrics, 15(5–6), 393–408.

    Google Scholar 

  • Athar, A. (2011). Sentiment analysis of citations using sentence structure-based features. In HLT-SS ‘11 Proceedings of the ACL 2011 student session (pp. 81–87). Stroudsburg: Association for Computational Linguistics.

  • Athar, A. (2014). Sentiment analysis of scientific citations (Technical report, UCAM-CL-TR-856). Cambridge: University of Cambridge Computer Laboratory.

    Google Scholar 

  • Bertin, M. (2008). Categorizations and annotations of citation in research evaluation. 13. Natural language processing; 13.1 discourse.

  • Blake, C. (2013). Text mining. Annual Review of Information Science and Technology, 45(1), 121–125.

    Article  Google Scholar 

  • Bonzi, S. (1982). Characteristics of a literature as predictors of relatedness between cited and citing works. Journal of the American Society for Information Science, 33(4), 208–216.

    Article  Google Scholar 

  • Bornmann, L., & Daniel, H.-D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.

    Article  Google Scholar 

  • Brooks, A. T. (1986). Evidence of complex citer motivations. Journal of the American Society for Information Science, 37(1), 34–36.

    Article  Google Scholar 

  • Cano, V. (1989). Citation behavior: Classification, utility, and location. Journal of the American Society for Information Science, 40(4), 284–290.

    Article  Google Scholar 

  • Carter, G.M. (1974). Peer review, citations, and biomedical research policy: NIH grants to Medical School Faculty. Rand Report, R-1583. Santa Monica: Rand.

  • Cavalcanti, D.C., Prudêncio, R.B.C., Pradhan, S.S., Shah, J.Y. & Pietrobon, R.S. (2011). Good to be bad? Distinguishing between positive and negative citations in scientific impact. In 23rd IEEE international conference on tools with artificial intelligence (ICTAI) (pp. 156–162). Boca Raton: IEEE.

  • Chubin, D. E. (1980). Letter to editor: Is citation analysis a legitimate evaluation tool? Scientometrics, 2(1), 91–94.

    Article  Google Scholar 

  • Cole, J. R. (2000). A short history of the use of citations as a measure of the impact of scientific and scholarly work. In B. Cronin & H. B. Atkins (Eds.), The web of knowledge a festschrift in honor of eugene garfield (pp. 281–300). New Jersey: Information Today.

    Google Scholar 

  • Cole, J. R., & Cole, S. (1971). Measuring the quality of sociological research: Problems in the use of the Science Citation Index. The American Sociologist, 6, 23–29.

    Google Scholar 

  • Cole, J. R., & Cole, S. (1972). The ortega hypothesis: Citation analysis suggests that only a few scientists contribute to scientific progress. Science, 178(4059), 368–375.

    Article  Google Scholar 

  • COPE. (2012). Citation manipulation.

  • Cozzens, S. E. (1985). Comparing the sciences: Citation context analysis of paper from neuropharmacology and the sociology of science. Social Studies of Science, 15, 127–153.

    Article  Google Scholar 

  • Cronin, B. (1981). The need for a theory of citing. Journal of Documentation, 37(1), 16–24.

    Article  Google Scholar 

  • Damashek, M. (1995). Gauging similarity with n-grams language independent categorization of text. Science, 267(5199), 843–848.

    Article  Google Scholar 

  • Davis, P. (2017). Citation cartel or editor gone rogue? [BlogPost]. Scholarly Kitchen.

  • Ding, Y., Liu, X., Guo, C., & Cronin, B. (2013). The distribution of references across texts: Some implications for citation analysis. Journal of Informetrics, 7, 583–592.

    Article  Google Scholar 

  • Ding, Y., Zhang, G., Chambers, T., Song, M., Wang, X., & Zhai, C. (2014). Content-based citation analysis: The next generation of citation analysis. Journal of the Association for Information Science and Technology, 65(9), 1820–1833.

    Article  Google Scholar 

  • Dong, C., & Schäfer, U. (2011). Ensemble-style self-training on citation classification. In 5th international joint conference on natural language processing, IJCNLP 2011 (pp. 623–631). Chiang Mai: AFNLP.

  • Elkiss, E., Shen, S., Fader, A., Erkan, G., States, D., & Radev, D. (2008). Blind men and elephants: What do citation summaries tell us about a research article? Journal of the American Society for Information Science and Technology, 59(1), 51–62.

    Article  Google Scholar 

  • Fu, L. D., & Aliferis, C. F. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.

    Article  Google Scholar 

  • Garfield, E. (1970). Can citation indexing be automated? Essays of an Information Scientists, 1, 84–90.

    Google Scholar 

  • Garfield, E. (1973). Citation frequency as a measure of research activity and performance. Essay of an Information Scientist, 1, 406–408.

    Google Scholar 

  • Garfield, E. (1979). Is citation analysis a legitimate evaluation tool? Scientometrics, 1(4), 359–375.

    Article  Google Scholar 

  • Goudsmith, S. A. (1974). Citation analysis. Science, 183(4120), 28.

    Article  Google Scholar 

  • Hacettepe Üniversitesi Bilimsel Araştırma Projeleri Koordinasyon Birimi Uygulama Esasları ve Araştırmacı Bilgilendirme Kılavuzu [Implementation Guideline of Hacettepe University Scientific Research Projects Coordination Unit and Information for Researchers]. (2015).

  • Halevi, G., & Bar-Ilan, J. (2016). Post retraction citations in context. In G. Cabanac, M. K. Chandrasekaran, I. Frommholz, K. Jaidka, M. Y. Kan, P. Mayr, & D. Wolfram (Eds.), BIRNDL 2016 bibliometric-enhanced information retrieval and natural language processing for digital libraries (pp. 23–29). Newark: CEUR.

  • Herlach, G. (1978). Can retrieval of information from citation indexes be simplified?: Multiple mention of a reference as a characteristic of the link between cited and citing article. Journal of the American Society for Information Science, 29(6), 308–310.

    Article  Google Scholar 

  • Jha, R., Jbara, A.-A., Qazvinian, V., & Radev, D. R. (2016). NLP-Driven citation analysis for scientometrics. Natural Language Engineering, 23(1), 93–130.

    Article  Google Scholar 

  • Johnson, C. A. (1985). Citations to authority in supreme court opinions. Law and Policy, 7(4), 509–523.

    Article  Google Scholar 

  • Kaplan, A. (2013). Üniversitelerde bilimsel yayın çalışmaları [Studies on scientific publications in universities] [Presentation]. Bilimsel Dergilerimiz ve Uluslararası İndekslerdeki Yeri Çalıştayı [Workshop on Our Scientific Journals and Their Roles on International Indexes].

  • Kaplan, P. (2014). Akademisyenlerin atıf çetesi [Citation gang of academics]. HaberTürk [news].

  • Kochen, M. (1974). Principles of information retrieval. Los Angeles: Melville.

    MATH  Google Scholar 

  • Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In IJCAI’95 proceedings of the 14th international joint conference on artificial intelligence (pp. 1137–1143). Montreal: ACM.

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

    Article  MATH  Google Scholar 

  • Lerner, J., & Wulf, J. (2007). Innovation and incentives: Evidence from corporate R&D. The Review of Economics and Statistics, 89(4), 634–644.

    Article  Google Scholar 

  • Liu, S., Chen, C., Ding, K., Wang, B., Xu, K., & Lin, Y. (2014). Literature retrieval based on citation context. Scientometrics, 101(2), 1293–1307.

    Article  Google Scholar 

  • Liu, Y., Yan, R. & Yan, H. (2013). Guess what you will cite: Personalized citation recommendation based on users’ preference. In R. E. Banchs, F. Silvestri, T.-Y. Liu, M. Zhang, S. Gao, & J. Lang (Eds.) Information retrieval technology: 9th Asia information retrieval societies conference, AIRS 2013 Singapore, December 2013 Proceedings (pp. 428–239). Heidelberg: Springer.

  • MacRoberts, M. H., & MacRoberts, B. R. (1996). Problems of citation analysis. Scientometrics, 36(3), 435–444.

    Article  Google Scholar 

  • Maričić, S., Spaventi, J., Pavičić, L., & Pifat-Mrzljak, G. (1998). Citation context versus the frequency counts of citation histories. Journal of the American Society for Information Science, 49(6), 530–540.

    Article  Google Scholar 

  • Markey, K. & Cochrane, P.A. (1981). Online training and practice manual for ERIC database searchers. New York: ERIC Clearing House on Information Sciences.

  • Matthew 25:29. (2004).

  • Merton, R. K. (1968). The Matthew Effect in science. Science, 159, 56–63.

    Article  Google Scholar 

  • Miller, J. C., Coble, K. H., & Lusk, J. L. (2013). Evaluating top faculty researchers and the incentives that motivate them. Scientometrics, 97, 519–533.

    Article  Google Scholar 

  • Moravcsik, M. J., & Murugesan, P. (1975). Some results on the function and quality of citations. Social Studies of Science, 5, 86–92.

    Article  Google Scholar 

  • Öğretim Üyeliğine Yükseltilme ve Atanma Yönetmeliği [Regulation on Promotion and Appointment to Instructional Membership]. (1982, 28 January). Resmi Gazete [Official Gazette] (Number: 17588).

  • Oppenheim, C. (1996). Do citations count? Citation indexing and the research assessment exercise (RAE). Serials, 9(2), 155–161.

    Article  Google Scholar 

  • Oransky, I. (2017). Citation-boosting episode leads to editors’ resignations, university investigation [BlogPost]. Retraction Watch.

  • Öztürk, K. (2012). Şişme dergiler ve etik ihlalleri [Bloated journals and ethical violations] [BlogPost].

  • Öztürk, K. (2013). Şişme dergiler, yeniden [Bloated journals, again] [Blogpost].

  • Price, D. J. D. S. (1986). Little science, big science… and beyond. New York: Columbia University Press.

    Google Scholar 

  • Ritchie, A. (2008). Citation context analysis for information retrieval. Ph.D. Dissertation, University of Cambridge.

  • Schneider, J.W. & Borlund, P. (2005). A bibliometric-based semi-automatic approach to identification of candidate thesaurus terms: Parsing and filtering of noun phrases from citation contexts. In F. Crestani & I. Ruthven (Eds.) Context: Nature, impact and role, 5th international conference on conceptions of library and information sciences, CoLIS 2005 Glasgow, UK, June 2005, Proceedings (pp. 226–237). Heidelberg: Springer.

  • Sendhilkumar, S., Elakkiya, E., & Mahalakshmi, G.S. (2013). Citation semantic based approaches to identify article quality. In D. C. Wyld (Ed.), Computer science and information technology (CS & IT) (pp. 411–420). Delhi: ICCSEA.

  • Shum, S. B. (1998). Evolving the web for scientific knowledge: First step towards an “HCI knowledge web”. Interfaces, British HCI Group Magazine, 39, 16–21.

    Google Scholar 

  • Silva, J. A. T., & Dobránszki, J. (2017). Highly cited retracted papers. Scientometrics, 110(3), 1653–1661.

    Article  Google Scholar 

  • Simkin, M. V., & Roychowdhury, V. P. (2003). Read before you cite! Complex Systems, 14, 269–274.

    Google Scholar 

  • Simkin, M. V., & Roychowdhury, V. P. (2006). Do you sincerely want to be cited? Or: Read before you cite. Significance, 3(4), 179–181.

    Article  MathSciNet  Google Scholar 

  • Smith, L. C. (1981). Citation analysis. Library Trends, 30, 83–106.

    Google Scholar 

  • Spiegel-Rösing, I. (1977). Science studies: Bibliometric and content analysis. Social Studies of Science, 7(1), 97–113.

    Article  Google Scholar 

  • Stigler, S. M. (1980). Stigler’s law of eponymy. Transactions. New York Academy of Sciences, 39(1), 147–157.

    Article  Google Scholar 

  • Suppe, F. (1998). The structure of a scientific paper. Philosophy of Science, 65(3), 381–405.

    Article  Google Scholar 

  • Tandon, N. & Jain, A. (2012). Citation context sentiment analysis for structured summarization of research papers.

  • Taşkın, Z. (2017). İçerik tabanlı atıf analizi modeli tasarımı: Türkçe atıflar için metin kategorizasyonuna dayalı bir uygulama (Designing a model for content-based citation analysis: an application for Turkish citations based on text categorization). Unpublished Ph.D. Dissertation, Hacettepe University.

  • Testa, J. (2008). Regional content expansion update: Web of Science 5.0.

  • Teufel, S. (1999). Argumentative zoning: Information extraction from scientific text. Unpublished Ph.D. Dissertation, University of Edinburg.

  • Teufel, S., Siddharthan, A. & Tidhar, D. (2006). Automatic classification of citation function.

  • Title Suppressions. (2016).

  • Tonta, Y. (2014). Akademik performans, öğretim üyeliğine yükseltme ve yayın destekleme ölçütleriyle ilgili bir değerlendirme [An evaluation of criteria on academic performance, tenure and publication support].

  • Van Raan, A.F.J. (2004). Measuring science: Capita selecta of current main issues. In H. F. Moed, W. Glänzel, & U. Schmoch, (Eds.) Handbook of quantitative science and technology research (pp. 15–50). Dordrecht: Kluwer Academic.

  • Van Raan, A. F. J. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62(1), 133–143.

    Article  Google Scholar 

  • Vinkler, P. (1994). Words and indicators: As scientometrics stands. Scientometrics, 30(2), 495–504.

    Article  Google Scholar 

  • Voos, H., & Dagaev, K. S. (1976). Are all citations equal? Or did we Op. Cit. your Idem? The Journal of Academic Librarianship, 1(6), 19–21.

    Google Scholar 

  • Wetterer, J. K. (2006). Quotation error, citation copying, and ant extinctions in Madeira. Scientometrics, 67(3), 351–372.

    Article  Google Scholar 

  • Woolgar, S. (1991). Beyond the citation debate: Towards a sociology of measurement technologies and their use in science policy. Science and Public Policy, 18(5), 319–326.

    Article  Google Scholar 

  • Xu, J., Zhang, Y., Wu, Y., Wang, J., Dong, X., & Xu, H. (2015). Citation sentiment analysis in clinical trial papers. AMIA Annual Symposium Proceedings, 2015, 1334–1341.

    Google Scholar 

  • Yu, B. (2013). Automated citation sentiment analysis: What can we learn from biomedical researchers? ASIS&T 2013 Annual Meeting Montréal, Québec, Canada, November 1–5, 2013.

  • Zhu, X., Turney, P., Lemire, D., & Vellino, A. (2015). Measuring academic influence: Not all citations are equal. Journal of the Association for Information Science and Technology, 66(2), 408–427.

    Article  Google Scholar 

  • Ziman, J. M. (1968). Public knowledge: An essay concerning the social dimension of science. Cambridge: Cambridge University Press.

    Google Scholar 

  • Zipf, G. (1949). Human behavior and the principle of least effort. Cambridge: Addison Wesley Pres.

    Google Scholar 

Download references


This article is based on Taşkın’s (2017) Ph.D. dissertation and was supported in part by a research grant from the Turkish Scientific and Technological Research Center (115K440).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Zehra Taşkın.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taşkın, Z., Al, U. A content-based citation analysis study based on text categorization. Scientometrics 114, 335–357 (2018).

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: