, Volume 87, Issue 3, pp 695–706 | Cite as

Mining typical features for highly cited papers

  • Mingyang WangEmail author
  • Guang Yu
  • Daren YuEmail author


In this paper, we discuss the application of the data mining tools to identify typical features for highly cited papers (HCPs). By integrating papers’ external features and quality features, the feature space used to model HCPs was established. Then, a series of predictor teams were extracted from the feature space with rough set reduction framework. Each predictor team was used to construct a base classifier. Then the five base classifiers with the highest classification performance and larger diversity on whole were selected to construct a multi-classifier system (MCS) for HCPs. The combination prediction model obtained better performance than models of a single predictor team. 11 typical prediction features for HCPs were extracted on the basis of the MCS. The findings show that both the papers’ inner quality and external features, mainly represented as the reputation of the authors and journals, contribute to generation of HCPs in future.


Highly cited papers Data mining Citation network 



We thank Dr. Xin Huang for the fruitful discussion. This work was supported by the National Natural Science Foundation of China (Grant Nos. 71003020; 70973031), the special funds of Central College Basic Scientific Research Bursary (Grant No. DL09BB51), and the research foundation of the ISTIC-Thomson Reuters Joint Lab for Scientometrics Research.


  1. Ball, M. O., Golden, B. L., & Vohra, R. V. (1989). Finding the most vital arcs in a network. Operations Research Letters, 8(2), 73–76.CrossRefzbMATHMathSciNetGoogle Scholar
  2. Bordley, R. F. (1982). A multiplicative formula for aggregating probability assessments. Management Science, 28, 1137–1148.CrossRefzbMATHGoogle Scholar
  3. Bornmann, L., & Daniel, H. D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45–80.CrossRefGoogle Scholar
  4. Cao W. G., Xie S. L., & Qiao X. D. (2008) Research on the identification methods of key nodes in supply chain information networks, Logistics: The Emerging Frontiers of Transportation and Development in China: Proceeding of the 8th International Conference of Chinese Logistics and Transportation Professionals 1949–1954.Google Scholar
  5. Case, D. O., & Higgins, G. M. (2000). How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science, 51(7), 635–645.CrossRefGoogle Scholar
  6. Fu, L., & Aliferis, C. (2010). Using content-based and bibliometric features for machine learning models to predict citation counts in the biomedical literature. Scientometrics, 85, 257–270.CrossRefGoogle Scholar
  7. Gilbert, N. G. (1977). Referencing as persuasion. Social Studies of Science, 7, 113–122.CrossRefGoogle Scholar
  8. Glänzel, W., Schlemmer, B., & Thijs, B. (2003). Better later than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 58(3), 571–586.CrossRefGoogle Scholar
  9. Hewings, A., Lillis, T., & Vladimirou, D. (2010). Who’s citing whose writings? A corpus based study of citations as interpersonal resource in English medium national and English medium international journals. Journal of English for Academic Purposes, 9(2), 102–115.CrossRefGoogle Scholar
  10. Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the USA, 102(46), 16569–16572.CrossRefGoogle Scholar
  11. Huang, X., Yu, D. R., Hu, Q. H., Wang, H. N., & Cui, Y. M. (2010). Short-term solar flare prediction using predictor teams. Solar Physics, 263(1), 175–184.CrossRefGoogle Scholar
  12. Kim, K. (2004). The motivation for citing specific references by social scientists in Korea: The phenomenon of co-existing references. Scientometrics, 59(1), 79–93.CrossRefGoogle Scholar
  13. Laband, D. N., & Piette, M. J. (1994). Favoritism versus search for good papers: Empirical evidence regarding the behavior of journal editors. Journal of Political Economy, 102, 194–203.CrossRefGoogle Scholar
  14. Leimu, R., & Koricheva, J. (2005). What determines the citation frequency of ecological papers? Trends in Ecology & Evolution, 20(1), 28–32.CrossRefGoogle Scholar
  15. Merton, R. K. (1968). The Matthew effect in science. Science, 159, 56–63.CrossRefGoogle Scholar
  16. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht, The Netherlands: Kluwer Academic Publishers.zbMATHGoogle Scholar
  17. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA.: Morgan Kaufmann.Google Scholar
  18. Tang, R., & Safer, M. A. (2008). Author-rated importance of cited references in biology and psychology publications. Journal of Documentation, 64(2), 246–272.CrossRefGoogle Scholar
  19. Van Dalen, H. P., & Henkens, H. P. K. (2001). What makes a scientific article influential? The case of demographers. Scientometrics, 50, 455–482.CrossRefGoogle Scholar
  20. Van Dalen, H. P., & Henkens, H. P. K. (2005). Signals in science—On the importance of signaling in gaining attention in science. Scientometrics, 64(2), 209–233.CrossRefGoogle Scholar
  21. Wroblewski J. (1998) Genetic algorithm in decomposition and classification problems, Physica, Heidelberg, 2: 471–487Google Scholar
  22. Xu, J., & Chen, H. (2005). Criminal network analysis and visualization. Communications of the ACM, 48(6), 100–107.CrossRefGoogle Scholar
  23. Yang, C. C., & Sageman, M. (2009). Analysis of terrorist social networks with fractal views. Journal of Information Science, 35(3), 299–320.CrossRefGoogle Scholar
  24. Yule, G. U. (1900). On the association of attributes in statistics. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 194, 257–319.CrossRefGoogle Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2011

Authors and Affiliations

  1. 1.School of ManagementHarbin Institute of TechnologyHarbinPeople’s Republic of China
  2. 2.Northeast Forestry UniversityHarbinPeople’s Republic of China
  3. 3.School of Power EngineeringHarbin Institute of TechnologyHarbinPeople’s Republic of China

Personalised recommendations