Scientometrics

, Volume 91, Issue 1, pp 203–217

A co-word analysis of digital library field in China

Authors

  • Gao-Yong Liu
    • School of ManagementGuangdong University of Technology
    • Center for Studies of Information ResourcesWuhan University
  • Hui-Ling Wang
    • Management School of Jinan University
Article

DOI: 10.1007/s11192-011-0586-4

Cite this article as:
Liu, G., Hu, J. & Wang, H. Scientometrics (2012) 91: 203. doi:10.1007/s11192-011-0586-4

Abstract

The aim of this study is to map the intellectual structure of digital library (DL) field in China during the period of 2002–2011. Co-word analysis was employed to reveal the patterns of DL field in China through measuring the association strength of keywords in relevant journals. Data was collected from Chinese Journal Full-Text Database during the period of 2002–2011. And then, the co-occurrence matrix of keywords was analyzed by the methods of multivariate statistical analysis and social network analysis. The results mainly include five parts: seven clusters of keywords, a two-dimensional map, the density and centrality of clusters, a strategic diagram, and a relation network. The results show that there are some hot research topics and marginal topics in DL field in China, but the research topics are relatively decentralized compared with the international studies.

Keywords

Digital library in ChinaCo-word analysisResearch advances

Introduction

Research on digital library (DL) in China has experienced drastic changes over the past years, and DL has become a hot research area in Library and Information Science and Computer Science (Qiu and Ma 2010). A majority of research topics of DL field have been paid attention to by researchers in China, and there have been some important and influential articles and researchers (e.g., Zhao 2006; Xu and Niu 2007; Gao 2008; Wang and Chen 2009; Gao and Liu 2011).

Many professionals have analyzed the research advances of DL field in China from the qualitative and quantitative views. Li (2009) gathered 1,948 articles on DL issued in 17 core journals in Library & Information Science from 1998 to 2007. In this analysis, an emphasis was laid in the aspects of time and space distribution, content distribution and author distribution. Zhang and Lv (2010) reviewed the development of DL from 2004 to 2008, and pointed out that digital resources, technology and service were three major research directions in DL field in China. Qiu and Wang (2010) revealed the distribution regularity of articles embodied in CNKI and the core authors, as well as the focuses according to keyword frequency. They found that technology and service were the two major trends of DL. Qiu and Ma (2010) found the concentration and scatter trends in keywords distribution in DL, and more and more researchers paid attention to the service and users in DL. Liu and Zhang (2011) conducted a statistical analysis of articles on DL issued in CSSCI from 2000 to 2010, and analyzed the subjects with cluster analysis. This article discussed the current situation, progress and tendency of DL research in China. The conclusion of this article indicated that technology and construction of DL were the research emphases. Dong (2009) conducted a co-word analysis of the journal articles on DL from 1999 to 2008. He collected the relevant data from CNKI. The focuses of DL research in China were information organization, resources construction, information service and the copyright, and there were crossed relationships among these four research focuses.

However, until now, few studies focus on the hidden information, such as the relationship and structure among research topics in DL field in China. This study attempts to map the intellectual structure of DL field in China using co-word analysis, including the relationships among keywords, the research structure and situation. These results will provide a basis for grasping the advances of DL field in China.

Methodology

The keywords could provide adequate description of a paper’s content. This study assumes that the keywords with proper frequency are chosen as the subject of co-word analysis to represent the specific topics. It indicates that any two keywords, co-occurring within a article, are relevant in the topics which they refer to (Cambrosio et al. 1993). The presence of many co-occurrences of pair of keywords within articles demonstrates that they may belong to one research theme (Ding et al. 2001). The correlation between keywords is calculated based on the number of articles which include these two keywords.

Co-word analysis is a kind of co-occurrence analysis (Small 1973; Small and Griffith 1974) which is an important bibliometric way to map the relationship among concepts, ideas, and problems in science and social science (Callon et al. 1983; Callon et al. 1991). Many researchers have used co-word analysis as an important method to explore the concept network in different fields (Coulter et al. 1998; Ding et al. 2001; Wang et al. 2011).

Co-word analysis, counting and analyzing the co-occurrences of keywords in articles on the given subject or field, could provide an immediate picture of the actual content of research topics (Callon et al. 1991; Ding et al. 2001). With the aid of multivariate statistical analysis and social network analysis, co-word analysis, different from other co-occurrence methods, is able to visualize the intellectual structure of a specific discipline through measuring the association strength of keywords from the relevant journals or other publications in this area. That is to say, co-word analysis could reveal patterns effectively in a specific discipline.

Data collection and process

In co-word analysis, once a research area is selected, keywords are extracted from the related journal articles or other publications; and then, a matrix based on the keyword co-occurrence will be built. The value of the cell in matrix represents the co-word frequency of two words. The higher co-occurrence frequency of two keywords means the more correlative they are. Finally, the original matrix is transformed into a correlation matrix using specific correlation coefficient for the further analysis.

We chose Chinese journal full-text database (CJFTD) as data source. We retrieved the related articles of DL field from CJFTD with the title “Digital Library”, the time span from 2002 to 2011. Notice that, in this retrieval, the category of data source is “core journals”. These journals, embodied in Chinese science citation database (CSCD) or Chinese humanities and social science citation database, are important and leading in a special research area in China. It can, therefore, be said that CSCD and CSSSCD in China correspond to SCI and SSCI.

In this study, a program in Java was developed to calculate the times that two keywords appeared together in the same article. Subsequently, we achieved a co-occurrence matrix called symmetric matrix. The data in diagonal cells were treated as missing data and the values of non-diagonal cell were the co-occurrence frequency. And then, this original matrix was transformed into a Pearson’s correlation matrix to indicate the similarity and dissimilarity of each keyword pair (Van Eck and Waltman 2009).

Method of data analysis

Similar to other documents using co-word analysis, we chose the multivariate statistical analysis and social network analysis. As we all know, in multivariate statistical analysis, clustering and multidimensional scaling (MDS) techniques are commonly used in co-occurrence analysis. The results of cluster analysis display directly the keywords cluster and the relations of topics; and the map generated by MDS could reflect the correlation degree through the location of research topics in the map. In this study, the cluster analysis and MDS were conducted using SPSS19.0 (Ding et al. 2001). Meanwhile, the visualization map and its network characters were also obtained by analyzing the Pearson’s correlation matrix using Ucinet6.0. These indexes include centrality, density, the core-periphery structure (Lee 2008), a strategic diagram and the network chart, which could show the status of DL field in China more intuitively.

Result and discussion

Frequencies of keywords

A total of 2,647 related articles and 9,538 keywords (3.6 per article) were collected as the co-word analysis sample. The number of keyword frequency obeys power-law distribution with an exponent −1.04 (Fig. 1), indicating that the research structure of DL in China is a scale-free network. In DL field, a few research topics which connect other research topics are the focuses with more attention and extensive contacts.
https://static-content.springer.com/image/art%3A10.1007%2Fs11192-011-0586-4/MediaObjects/11192_2011_586_Fig1_HTML.gif
Fig. 1

Distribution of number of frequency

In order to achieve more precise results, we standardized these keywords by merging the synonyms (e.g., “ontology technology”, “ontology technology application” are replaced by “ontology”; “information resource organization” is replaced by “information organization”.) and excluding the general terms which are meaningless or too broad (e.g., theories, construction, development, influence, applications). Finally, 66 keywords with the frequency more than 10 were chosen (shown in Table 1). The frequencies of these 66 keywords are 5,285 times (about 55% of the total), covering the main research topics of DL field in China to a very great extent. Notice that the keyword “Digital Library” is our research object. That is to say, the keyword “Digital Library” is meaningless for the study, so we removed it in the analysis below.
Table 1

The top 66 keywords

No.

Keyword

Frequency

1

Digital library

2,249

2

Copyright

236

3

Personalized service

139

4

User

125

5

Information retrieval

122

6

Metadata

111

7

Information resources

109

8

Information service

107

9

Grid

106

10

Intellectual property

104

11

Ontology

101

12

Web

92

13

Library

79

14

Evaluation

78

15

Information resources construction

64

16

Interoperability

59

17

Information resource sharing

58

18

Digitalization

56

19

Data mining

52

20

Storage

50

21

Database

46

22

Knowledge management

45

23

Reference service

45

24

Information security

44

25

Storage area network

44

26

Traditional library

43

27

Information organization

43

28

Personal digital library

40

29

XML

39

30

Portal

37

31

Library alliance

36

32

Knowledge organization

35

33

Semantic web

33

34

Digital resource

32

35

Standard

30

36

university library

29

37

Information resource integration

29

38

Knowledge service

27

39

OAI

27

40

University

27

41

Information filtering

23

42

Google

23

43

Information dissemination

22

44

Library construction

22

45

Digital library construction

22

46

Web2.0

22

47

Librarian

21

48

E-commerce

21

49

Web service

21

50

Cloud computing

21

51

Information integration

20

52

Information technology

19

53

Reader

19

54

Service mode

18

55

Information visualization

17

56

Network attached storage

16

57

Search engine

15

58

Information architecture

15

59

Hybrid Library

14

60

Information literacy

14

61

Information extraction

13

62

RDF

13

63

Digital university library

12

64

Library science

12

65

Open source software

11

66

Ubiquitous knowledge environment

11

The high co-word frequency and correlation coefficient reflect the importance of keywords to some extent. The top ten keywords with high total co-word frequency are copyright (132), ontology (124), personalized service (119), metadata (111), user (110), grid (101), information retrieval (99), information service (97), interoperability (78) and intellectual property (77). The top ten keywords with high total co-word correlation coefficients are information resources (10.91), information retrieval (10.52), knowledge management (10.34), knowledge organization (10.32), information service (10.07), information organization (9.48), Web (9.32), User (9.20), personalized service (9.03) and personal digital library (9.00). It can, therefore, be said that these research topics are the focuses of DL field in China. Notice that information retrieval, information service, user, and personalized service have the higher frequency and correlation coefficient, indicating that these research topics are major focuses and the bridges connecting other research topics in DL field.

Multivariate statistical analysis

We conducted the cluster analysis using hierarchical clustering with Ward’d method, and the distance measure is “Squared Euclidean distance”. These 65 keywords of DL field in China were divided into seven clusters named Cluster1 to Cluster7. Given the frequency, co-word frequency and co-word correlation coefficient, the top three keywords in each cluster were chosen to represent these seven clusters because the keywords with lower frequency were paid little attention to by researchers in China. Table 2 shows the information of each cluster.
Table 2

Seven clusters of DL field in China

Cluster

Number of keywords

Cluster name

Keywords

1

3

Storage technology

Storage; storage area network; network attached storage

2

6

Library; digitalization; university

Library; digitalization; university; cloud computing; information technology; reader

3

13

Evaluation; information security; traditional library

Evaluation; information security; traditional library; library alliance; digital resource; standard; university library; library construction; digital library construction; hybrid library; digital university library; library science; open source software

4

8

Copyright; information resources; intellectual property

Copyright; information resources; intellectual property; Web; information resources construction; database; google; information dissemination

5

13

Personalized service; user; information service; data mining

Personalized service; user; information service; data mining; reference service; information filtering; Web2.0; librarian; E-commerce; service mode; information architecture; information literacy; ubiquitous knowledge environment

6

8

Information retrieval; ontology; knowledge management

Information retrieval; ontology; knowledge management; knowledge organization; semantic web; information integration; information visualization; information extraction

7

14

Metadata; grid; interoperability; information resource sharing

Metadata; grid; interoperability; information resource sharing; information organization; personal digital library; XML; portal; information resource integration; knowledge service; OAI; Web service; search engine; RDF

The current research topics of DL field in China are shown in Table 2. For example, Cluster1 is related to the storage technology in DL (Li and Dong 2009; Shen 2010); Cluster2 includes the researches on digitalization, cloud computing, and information technology (Wu 2008; Ding 2010; Wang and Chen 2009).

In international DL field, there are two typical articles describing the research status well (Su (2009); Wei and Wei (2011)). Su (2009) analyzed the international status of DL field using Web of Science as data source. She also utilized methods of co-word analysis to obtain two large research directions (information organization and standard, technology), and indicated that the trend of crossing and amalgamation among research themes in international DL field was more and more obvious since 2006. Wei and Wei (2011) summarized four hot research areas of international DL field based on the analysis of related articles of DL in Web of science. These four major research areas in DL are technology, users, evaluation, and application. The research topics of international DL field demonstrated the crossing and amalgamation. Compared with the status of international DL field, research topics of DL field in China are relatively decentralized.

As we know, the Pearson’s correlation could indicate the similarity and dissimilarity of each keyword pair, and could also indicate the correlation degree among research themes. In this article, one cluster represents one large research theme or research direction of DL field in China. The correlation degree among these clusters demonstrates their status or position (central or marginal) on the whole. One research theme or cluster of high total correlation coefficient and with more connection with others would be central and important in DL field.

The total co-word correlation coefficient of each keyword indicates its connection with other keywords, and also the status or position in DL field on the whole. And then we drew the sum of keywords in each cluster, representing one large research theme or direction, to indicate its status or position in DL field.

According to the Pearson’s correlation matrix, we summed up the co-word frequency and correlation coefficient of keywords in each cluster, and calculated their averages. For a better understanding, each cluster considering the frequency and co-word data (shown in Table 3) will be supplied in following part.
Table 3

The frequency and co-word data of each cluster

Cluster

Total frequency

Total co-word frequency

Total co-word correlation coefficient

Average frequency

Average co-word frequency

Average co-word correlation coefficient

1

110

99

−4.71

36.67

33.00

−1.57

2

220

184

29.68

36.67

30.67

4.95

3

418

280

38.34

32.15

21.54

2.95

4

678

486

63.41

84.75

60.75

7.93

5

612

532

78.27

47.08

40.92

6.02

6

377

420

62.93

47.13

52.50

7.87

7

621

685

84.95

44.36

48.93

6.07

Here, average of frequency and co-word data are treated as the important indexes to distinguish every cluster representing the research topics.
  1. (1)

    The higher average frequency in Cluster4, 5, and 6 indicates the more attention being paid in China, and vice versa. Especially, Cluster4 and 6 have the highest average co-word frequency and co-word correlation coefficient due to the more connection with topics in other clusters. Meanwhile, these results indicate the importance of topics in Cluster4 and 6; for instance, the study of copyright and intellectual property, knowledge management and information retrieval, visualization and integration based on ontology and semantic web.

     
  2. (2)

    Cluster 3 has a low average frequency and average co-word correlation coefficient. It reflects that topics in Cluster3 are of little attention and association with other topics. These topics lie in a marginal location; and they are no longer the research focus of DL field in China.

     
  3. (3)

    Three indexes are all high in Cluster4, indicating that these topics are currently the major focuses, as well as have more association with other topics. It is worth noting that more and more researchers in China are paying attention to the intellectual property in information dissemination, and draw lessons from international experience and results.

     
  4. (4)

    Cluster1 owns the fewest keywords, as well as the lowest average co-word correlation coefficient which shows that these topics are relatively isolated. However, the relatively high average frequency also illustrates that a number of researchers in China have been studying these topics.

     

The top five keywords in each cluster (there are three keywords in Cluster1) were selected. These 33 keywords were analyzed with hierarchical clustering and MDS. In MDS, multidimensional scaling (ALSCAL) was selected, and the distance measure is Euclidean distance. The stress value is 0.1594 and RSQ is RSQ = 0.8733, which is better.

These 33 keywords were divided into 8 clusters (named Cluster1’ to Cluster8’) by clustering analysis (shown in Fig. 2), and each cluster was circled in the two-dimensional map by MDS (shown in Fig. 3). The correlation degree of research topics could be intuitively discerned through the location and distance in Fig. 3.
https://static-content.springer.com/image/art%3A10.1007%2Fs11192-011-0586-4/MediaObjects/11192_2011_586_Fig2_HTML.gif
Fig. 2

8 cluster solution of 33 keywords

https://static-content.springer.com/image/art%3A10.1007%2Fs11192-011-0586-4/MediaObjects/11192_2011_586_Fig3_HTML.gif
Fig. 3

Two-dimensional map of research topics of DL field in China

On the basis of Figs. 2 and 3, we can reach the following conclusions.
  1. (1)

    We can conclude the correlation among clusters according to the clustering step in Fig. 2, such as Cluster2’ and 8’, Cluster 3’, 4’ and 5’, Cluster6’ and 7’. In Fig. 3, research topics with high correlation locate together. On the whole, 8 clusters respectively scatter in the map, and the distribution of research topics is more dispersive. It is worthwhile to note that there are not core research topics. In China, most researchers choose and study one or some topics of DL; while few study DL on the whole.

     
  2. (2)

    In Fig. 3, from left to right, the research topics are from technology to theory. In China, theory is rich while technology and practice is poor. From bottom to top, the focuses which are more compact locate on the top; and research topics on the bottom are relatively isolated. The research topics on the top are more relatively mature, while on the bottom are undeveloped. These research topics could be divided into four large research areas: the construction of DL (Cluster3’, Cluster4’ and Cluster5’), the organization of DL (Cluster6’ and Cluster7’), the service of DL (Cluster8’ and Cluster2’) and the storage of DL (Cluster1’). Furthermore, there are two large research fields: Cluster3’, 4’ and 5’ locate in quadrant I and Cluster 6’ and 7’ in quadrant II.

     
  3. (3)

    It could be seen in the map that Information Construction and organization of DL are two significant research areas in China. Notice that Cluster2’ and Cluster8’ were merged into a cluster in clustering analysis, but their location in the map and the distance among keywords showed that the study of DL service was basically fragmentary and unsystematic, so it should be under deep research. Meanwhile, we could see that research topics in Cluster1’ were in a marginal location corresponding to the results above.

     

Analysis of co-word network

In this article, we analyzed the co-word network based on Pearson’s correlation matrix, including centrality (degree centrality and betweenness centrality), density using Ucinet6.0. A core-periphery matrix was generated in order to determine the core keywords more precisely from the perspective of the whole structure. Subsequently, we drew the strategic diagram to show the current status and trend of research themes more clearly. Furthermore, a relation network reflecting the structure and relationship of keywords was achieved by NetDraw embedded in Ucinet.

The top 10 keywords with high degree centrality and betweenness centrality are listed in Table 4.
Table 4

The top 10 of all keywords with high centrality

Ranking

Keyword

Degree centrality

Ranking

Keyword

Betweenness centrality

1

Information resources

9.91

1

Web

135.33

2

Information retrieval

9.52

2

Reader

54.98

3

Knowledge management

9.34

3

University

48.83

4

Knowledge organization

9.32

4

Information resources

45.64

5

information service

9.07

5

Storage area network

38.09

6

Information organization

8.48

6

Information organization

31.61

7

Web

8.32

7

Information service

30.48

8

User

8.20

8

Digital resource

24.81

9

Personalized service

8.03

9

Standard

24.50

10

Personal digital library

8.00

10

Network attached storage

24.31

  1. (1)

    The degree centrality in Table 4 shows the current core research topics in DL field in China, such as information resource, information retrieval, knowledge management, knowledge organization, and information service. Keywords with high betweenness centrality play a role as bridges among research topics, such as Web, readers, university, information resource, and storage area network.

     
  2. (2)

    Analyzing the correlation matrix by core-periphery method, 23 keywords were determined as the core words from the perspective of the whole structure, basically representing the research focuses of DL in China, such as personalized service, user, information retrieval, information resource, information service, grid, ontology, interoperability, data mining, and knowledge management.

     
The density of network with Pearson’s coefficient formed by 65 keywords was 0.069, a relatively low level, indicating that DL in China was relatively decentralized. We rebuilt the co-occurrence matrix of keywords belonging to the same cluster, and transformed the co-occurrence matrix into a correlation matrix. Subsequently, density and degree centrality of each cluster were re-calculated using Ucinet (shown in Table 5), which supplements the results above.
Table 5

Density and degree centrality of each cluster

Cluster

Degree centrality

Density

1

−6.98

−0.28

2

3.95

0.95

3

1.72

0.07

4

6.93

0.65

5

5.02

0.38

6

6.87

0.89

7

5.07

0.19

A strategic diagram was generated as Fig. 4 according to Table 5. The origin is (3.22, 0.41), the average of degree centrality and density. The strategic diagram divides these seven clusters into four quadrants. In the strategic diagram, x-axis stands for degree centrality representing the strength of interaction among research fields. The high degree centrality means that research field may tend to lie in an essential and center position. y-axis stands for density representing the internal relation in a specific research field. From the perspective of research field, density represents the capability to maintain and develop itself (Law et al. 1988). Research topics with high degree centrality and density in quadrant I are well-developed and the core of the field. Research topics in quadrant II are not central but well-developed. Research topics in quadrant III are both marginal and neglected. Research topics in quadrant IV are central in the network but undeveloped (Callon et al. 1991).
https://static-content.springer.com/image/art%3A10.1007%2Fs11192-011-0586-4/MediaObjects/11192_2011_586_Fig4_HTML.gif
Fig. 4

The strategic diagram of seven clusters

  1. (1)

    Clusters in quadrant I are Cluster2, 4, and 6. These clusters’ density and degree centrality are both high. High density indicates that these clusters are of high internal correlation, and the research topics in clusters have been well-developed and tend to be mature in China. High degree centrality indicates that the cluster is widely connected with other clusters. Research topics in these three clusters are the cores of DL field in China.

     
  2. (2)

    Cluster1 and 3 locate in quadrant III. The low density and degree centrality reveal that the research topics in these clusters are marginal and undeveloped in China. This result better explains the current research advances of DL field in China combining with the results above.

     
  3. (3)

    Cluster5 and 7 locate in quadrant IV with high degree centrality but low density. This phenomenon illustrates that the research topics in these clusters are the cores but undeveloped in DL field in China. Therefore, information service, interoperability and sharing, which will become research trends, need to be further studied.

     
A network chart of all the keywords could intuitively show the structure of research topics of DL field in China (shown in Fig. 5). The relative size of nodes represents the frequency of keywords and the relative size of lines represents the correlation degree between keywords.
https://static-content.springer.com/image/art%3A10.1007%2Fs11192-011-0586-4/MediaObjects/11192_2011_586_Fig5_HTML.gif
Fig. 5

The network structure of keywords in 2002–2011(the line represents the link between two keywords with the Pearson coefficient ≥0.5)

Conclusion

The research advances of DL field in China were achieved using co-word analysis, including the current status and the relationship between research topics. Utilizing the tools of SPSS19.0 and Ucinet6.0, we obtained some clear and reasonable analysis results of DL field in China.
  1. (1)

    The core keywords were discerned according to frequency, centrality and core-periphery, mainly including knowledge management, user, personalized service, information retrieval, information resource, information service, grid, metadata, personal digital library, ontology, interoperability, intellectual property, information visualization, portal, and information organization. Therefore, we are able to achieve the focuses of DL field in China.

     
  2. (2)

    The keywords selected to represent the research topics of DL field in China were divided into seven clusters. Each cluster represents a research direction of DL field. The correlation among research topics is low on the whole. Compared with international studies, DL in China is relatively decentralized.

     
  3. (3)

    Major research topics of DL field have formed in China, but there are more smaller and isolated research topics. It could be said that the research topics in Cluster4 and 6 are the cores of DL field in China. The well-developed and core research fields of DL field in China are fewer, such as copyright and intellectual property (in Cluster4).

     
  4. (4)

    In DL field, there are more theoretical researches, but fewer technological and practical researches. Resource construction and information organization of DL are the two significant research topics in China. Recently, information service and interoperability of DL are being paid more and more attention to. Therefore, the whole research of DL in China should integrate resources (as basis), technology (as support) and service (as core).

     

Overall, this study has led us to grasp the current status of DL in China. Furthermore, we will continue to study the dynamic trend of this field and analyze the international status comparatively in order to adjust the domestic research and connect with the international trend.

Acknowledgments

This study is supported by the project of National Social Science Foundation of China (No. 09CTQ020) and the project of National Social Science Foundation of China (No. 11CTQ006).

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2011