Abstract
As the framework of scientific research, subject-classification plays an important role in the development of science. In order to combine the development of science with the current expert subject-classification system and further give a more appropriate description of scientific output analysis from subject level, We study the relationship between the natural science related sub-categories of Chinese library classification using objective computerized scientometrics, and give some modification to the first two level subjects of the existing Chinese library classification system. Taking Chinese Science Citation Database as our data source, this article studies the similarity of subjects based on journal coupling strength. Then we try to set up an improved subject-classification system whose top categories are relied on Chinese library classification system and sub-categories are the ensemble clustering result based on journal coupling measure. Further, in order to help identifying and interpreting the rationality of this improved classification system, we make use of some text mining methods, such as key words recognition and topic detection, to explain the cause of similarity between some subjects from the perspective of semantic. Our study shows that the improved subject-classification system constructed in this article not only conforms to previous experience and cognitive but also combines subject development knowledge.
Similar content being viewed by others
Notes
In this study, we use term “cross-citation” to refer to the citing and cited behavior among articles, journals and authors and so on. Hereafter, we will also mention term “coupling”, such as “journal coupling”, and this refers to the measurement we used to study the similarity between different journals or different subjects based on the “cross-citation” behavior among them. That is to say, in this study we use “journal coupling” to study the “cross-citation” relationship among subjects, so the “cross-citation” relationship is the basis of the similarity measure “journal coupling”.
80 % is determined by repeated trials so that sparse degree of the adjacency matrix can be reduced significantly and the raw information cannot be loss much.
In subject-journal matrix we derived from step 3, the number in row i and column j indicate times cited of journal j by subject i. In order to avoid the influences indicated in step 4, we choose to calculate the journal-based coupling strength of different subject with the simple method (a basic method of calculating coupling strength), which only consider the number of journals coupled by two subjects not cites. So we change the original cites in matrix to 0–1 which indicated if the citation from subject to journal is exist or not. Well, the simple method of coupling has problems of using original information insufficiently. But compared with bias coming from the sensitive cites, bias coming from the insufficient data usage is smaller, so we eventually choose this method, and further we will make great effort to improve our data quality and try to apply other coupling calculation method, such as the binary one proposed by Rousseau et al. (2004).
We choose the general Gower’s coefficient for the reason that it is suitable for handling of nominal, ordinal, and binary data. Moreover, due to including weights to different variable, the calculation of distance is more robust.
For each number of clusters k, it compares log (W (k)) with E^*[log (W (k))] where the latter is defined via bootstrapping, i.e. simulating from a reference distribution. The optimal number of cluster is the one who make the log (W (k)) decrease most fast, that is make the Gap statistics increase most fast to its maximum.
We believe the subject-classification we derived in this paper is applicable to other situation for the reason that the journals in CSCD source list are all nature science related core journals. And according to Garfield's Law of Concentration, the citation behavior of these core journals have strong representation, so the modified subject system based on citation can be commonly adopted by situations using CLC to some extent.
References
Ahlgren, P., & Colliander, C. (2009). Document–document similarity approaches and science mapping: Experimental comparison of five approaches. Journal of Informetrics, 3(1), 49–63. doi:10.1016/j.joi.2008.11.003.
Archambault, É., Beauchesne, O. H., & Caruso, J. (2011). Towards a multilingual comprehensive and open scientific journal ontology. In E. C. M. Noyons, P.Ngulube, & J. Leta (Eds.), Proceedings of the 13th international conference of the international society for scientometrics and informetrics (pp. 66–77).
Börner, K., Klavans, R., Patek, M., Zoss, A. M., Biberstine, J. R., Light, R. P., et al. (2012). Design and update of a classification system: The UCSD map of science. PLoS One, 7(7), e39464. doi:10.1371/journal.pone.0039464.
Boyack, K. W., Klavans, R., & Börner, K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351–374.
Braam, R. R., Moed, H. F., & van Raan, A. F. J. (1991). Mapping of science by combined co-citation and word analysis: I: Structural Aspects. Journal of the American Society for Information Science and Technology, 42(4), 233–251.
Cason, H., & Lubotsky, M. (1936). The influence and dependence of psychological journals on each other. Psychological Bulletin, 33(2), 95–103.
Chang, Y. F., & Chen, C.-M. (2011). Classification and visualization of the social science network by the minimum span clustering method. Journal of the American Society for Information Science and Technology, 62(8), 2404–2413.
Chen, C. M., Ibekwe-SanJuan, F., & Hou, J. H. (2010). The structure and dynamics of co-citation clusters: A multiple-perspective co-citation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.
Daniel, R. S., & Loutitt, C. M. (1953). Professional problems in psychology. New York: Prentice Hall.
Everitt, B. (1974). Cluster analysis. London: Heinemann Educ.
Glänzel, W., & Schubert, A. (2003). A new classification scheme of science fields and subfields designed for scientometric evaluation purposes. Scientometrics, 56(3), 357–367.
Gómez-Núñez, A. J., Batagelj, V., Vargas-Quesada, B., Moya-Anegón, F., & Chinchilla-Rodríguez, Z. (2014). Optimising SCImago journal & country rank classification by community detection. Journal of Informetrics, 8(2), 369–383.
Gómez-Núñez, A. J., Vargas-Quesada, B., & Moya-Anegón, F. (2015). Updating the SCImago journal and country rank classification: A new approach using Ward's clustering and alternative combination of citation measures. Journal of the Association for Information Science and Technology, 67(1), 178–190.
Hartigan, J. A., & Wong, M. A. (1979). A K-means clustering algorithm. Applied Statistics, 28(1), 100–108.
Katz, J. S., & Hicks, D. (1995). The classification of interdisciplinary journals: A new approach (Version 2.0). In M.E.D. Koenig & A. Bookstein (Eds.), Proceedings of the Fifth Biennial Conference of the International Society for Scientometrics and Informatics (pp. 245–254). Medford: Learned Information.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Kessler, M. M. (1963). Bibliographic coupling between scientific Papers. American Documentation, 14(1), 10–25.
Kronegger, L., Mali, F., & Ferligoj, A. (2013). Classifying scientific disciplines in Slovenia: A study of the evolution of collaboration structures. Journal of the American Society for Information Science and Technology, 66(2), 321–339.
Leydesdorff, L. (2002). Dynamic and evolutionary updates of classificatory schemes in scientific journal structures. Journal of the American Society for Information Science and Technology, 53(12), 987–994.
Leydesdorff, L. (2004a). Clusters and maps of science journals based on bi-connected graphs in the Journal Citation Reports. Journal of Documentation, 60(4), 371–427.
Leydesdorff, L. (2004b). Top-down decomposition of the Journal Citation Report of the Social Science Citation Index: Graph- and factor-analytical approaches. Scientometrics, 60(2), 159–180.
Leydesdorff, L. (2006). Can scientific journals be classified in term of aggregated journal—Journal citation relations using the journal citation reports. Journal of the American Society of Information and Technology, 57(5), 601–603.
Leydesdorff, L., & Cozzen, S. E. (1993). The delineation of specialties in terms of Journals using the dynamic journal set of the SCI. Scientometrics, 26(1), 135–156.
Leydesdorff, L., & Rafols, I. (2008). A global map of science based on the ISI discipline categories. Journal of the American Society for Information Science and Technology, 60(2), 348–362.
Leydesdorff, L., & Rafols, I. (2012). Interactive overlays: A new method for generating global journal maps from Web-of-Science data. Journal of Informetrics, 6(2), 318–332. doi:10.1016/j.joi.2011.11.003.
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, (pp. 281–297) University of California Press, Berkeley, Calif.
Marshakova, S. I. (1973). System of Document Connections Based on References. Scientific and Technical Information Serial of VINITI, 6(2), 3–8.
Narin, F. (1976). Evaluative bibliometrics: The use of publication and citation analysis in the evaluation of scientific activity. Washington, DC: National Science Foundation.
Narin, F., Carpenter, M., & BerltN, C. (1972). Interrelationships of scientific journals. Journal of the American Society for Information Science, 23(5), 323–331.
Ni, C., Sugimoto, C. R., & Jiang, J. (2013). Venue-author-coupling: A measure for identifying disciplines through author communities. Journal of the American Society for Information Science and Technology, 64(2), 265–279.
Qiu, J., & Dong, K. (2013). A Comparative study on the ability of author co-occurrence network in revealing scientific structure. Journal of library science china, 39(1), 15–24. (In Chinese).
Qiu, J., & Liu, G. (2014). Research of discipline knowledge aggregation based on the journal-author coupling method. Journal of intelligence, 33(4), 17–22. (In Chinese).
Reynolds, A., Richards, G., de la Iglesia, B., & Rayward-Smith, V. J. (1992). Clustering rules: A comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modeling and Algorithms, 5(4), 475–504.
Rousseau, R., & Zuccala, A. (2004). A classification of author co-citations: Definitions and search strategies. Journal of the American Society for Information Science and Technology, 55(6), 513–529.
Small, H. (1973). Co-citation in the Scientific Literature:A New Measure of the Relationship Between Two Documents. Journal of the American Society for Information Science, 24(4), 265.
Tibshirani, R., Walther, G., & Hastie, T. (2001). Estimating the number of data clusters via the Gap statistic. Journal of the Royal Statistical Society B, 63(2), 411–423.
Waltman, L., & Van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the Association for Information Science and Technology, 63(12), 2378–2392. doi:10.1002/asi.22748.
White, H. D., & McCain, K. W. (1998). Visualizing a Discipline: An Author Co-Citation Analysis of Information Science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.
Zhang, L., Janssens, F., Liang, L., & Glänzel, W. (2010). Journal cross-citation analysis for validation and improvement of journal-based discipline classification in bibliometric research. Scientometrics, 82(5), 687–706.
Zhang, L., Liang, L., Liu, Z., & Glänzel, W. (2012). The analysis of science structure based on journal clustering and SOOI classification system. Study in science of science, 30(9), 14–22. (In Chinese).
Zhao, D. Z., & Strotmann, A. (2008). Evolution of research activities and intellectual in information science 1996–2005: Introducing author bibliographic -coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070–2086.
Author information
Authors and Affiliations
Corresponding author
Appendix 1
Appendix 1
See Table 7.
Rights and permissions
About this article
Cite this article
Zhang, J., Liu, X. & Wu, L. The study of subject-classification based on journal coupling and expert subject-classification system. Scientometrics 107, 1149–1170 (2016). https://doi.org/10.1007/s11192-016-1890-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-016-1890-9