Mining typical features for highly cited papers
- 551 Downloads
In this paper, we discuss the application of the data mining tools to identify typical features for highly cited papers (HCPs). By integrating papers’ external features and quality features, the feature space used to model HCPs was established. Then, a series of predictor teams were extracted from the feature space with rough set reduction framework. Each predictor team was used to construct a base classifier. Then the five base classifiers with the highest classification performance and larger diversity on whole were selected to construct a multi-classifier system (MCS) for HCPs. The combination prediction model obtained better performance than models of a single predictor team. 11 typical prediction features for HCPs were extracted on the basis of the MCS. The findings show that both the papers’ inner quality and external features, mainly represented as the reputation of the authors and journals, contribute to generation of HCPs in future.
KeywordsHighly cited papers Data mining Citation network
We thank Dr. Xin Huang for the fruitful discussion. This work was supported by the National Natural Science Foundation of China (Grant Nos. 71003020; 70973031), the special funds of Central College Basic Scientific Research Bursary (Grant No. DL09BB51), and the research foundation of the ISTIC-Thomson Reuters Joint Lab for Scientometrics Research.
- Cao W. G., Xie S. L., & Qiao X. D. (2008) Research on the identification methods of key nodes in supply chain information networks, Logistics: The Emerging Frontiers of Transportation and Development in China: Proceeding of the 8th International Conference of Chinese Logistics and Transportation Professionals 1949–1954.Google Scholar
- Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA, USA.: Morgan Kaufmann.Google Scholar
- Wroblewski J. (1998) Genetic algorithm in decomposition and classification problems, Physica, Heidelberg, 2: 471–487Google Scholar