Social Network Analysis and Mining

, Volume 1, Issue 1, pp 27–42

Social networks and online environments: when science and practice co-evolve

Authors

  • Devan Rosen
    • Department of SpeechUniversity of Hawaii
    • Department of CommunicationUniversity of California, Davis
  • Jang Hyun Kim
    • Department of SpeechUniversity of Hawaii
Open AccessOriginal Article

DOI: 10.1007/s13278-010-0011-7

Cite this article as:
Rosen, D., Barnett, G.A. & Kim, J.H. SOCNET (2011) 1: 27. doi:10.1007/s13278-010-0011-7

Abstract

The science of social network analysis has co-evolved with the development of online environments and computer-mediated communication. Unique and precise data available from computer and information systems have allowed network scientists to explore novel social phenomena and develop new methods. Additionally, advances in the structural analysis and visualization of computer-mediated social networks have informed developers and shaped the design of social media tools. This article reviews some examples of research that highlight the ways that social network analysis has evolved with online data. Examples include the international hyperlink network, political blogs and hyperlinks, social media, and multi-user virtual environments. The data available from online environments makes several important contributions to network science, including reliable network flow data, unique forms of relational data across a myriad of contexts, and dynamic data allowing for longitudinal analysis and the animation of social networks.

Keywords

Social network analysis Computer-mediated communication Information systems Hyperlink networks Social media Social networking

1 Introduction

A social network is generally defined as a system with a set of social actors and a collection of social relations that specify how these actors are relationally tied together (Wasserman and Faust 1994). Network analysis provides two purposes, revealing the underlying social structures and discovering the dynamic interactions among social actors. Network analysis identifies the system’s structure through examining the relations among the system components, its actors (Rogers and Kincaid 1981).

Computer and information systems are electronic communication networks that are structured in order that data, information, and messages may be passed from one location in the network to another over multiple links: transmission lines (copper wire, coaxial cable, optical fiber, and wireless connections including satellites) and through various nodes (generally computers). When these networks link people (or higher level social systems such as work groups, organizations, or nations) as well as machines, they become social networks or more precisely computer-mediated social infrastructures. Examples of telecommunication networks include the Internet: the global network of networks (Barnett et al. 2001b; Barnett and Park 2005; Park et al. 2010), public switched telephone networks (POTS/PSTN) (Barnett 1999, 2001; Barnett and Salisbury 1996) and the global Telex network (Ahn and Barnett 1995), as well as numerous proprietary computer networks for the communication of business and financial information (e.g., the ATM network) (Salisbury and Barnett 1999).

The goal of this article is to review some examples of research on social networks in online environments. It will begin by examining the international hyperlink network. The next section will focus specifically on blogs and political hyperlinks. A third section will examine social media such as social networking sites (SNS), followed by a section on networks in multi-user virtual environments (MUVEs) (e.g., Active Worlds, Second Life). The final section will discuss the future of online social networks and contributions to the science of networks, and implications for future research.

2 The international hyperlink network

Few studies have examined the international Internet’s structure. One reason for this is that the Internet is a packet-switched network unlike the telephone, which devotes a single circuit to each individual message. Consequently, the origin and destination of individual messages cannot be determined (Barnett and Park 2005). An alternative approach that allows the examination of international Internet traffic is the analysis of inter-domain hyperlinks (Barnett et al. 2001a, b). A hyperlink is the technological capability that enables a Web site to link seamlessly with another, generally through a click of a mouse (Park et al. 2004). The World Wide Web may be defined as a distributed hypertext system consisting of a virtual network of content and hyperlinks, with billions of inter-linked pages (Almindt and Ingwersen 1997; Kleinberg and Lawrence 2001). The Web has no “engineered architecture”, and as such it is a self-organized system with a well-defined structure of linkage that implies an underlying social structure (Chakrabarti et al. 1999; Shumate and Lipp 2008). This section examines the Web’s emergent social structure as a technological link and communication network at the level of nation states.

In the first large-scale study of the international Internet, Barnett et al. (2001b) examined data on the bilateral Internet links among nations obtained from Organization for Economic Co-operation and Development (OECD). The number of inter-domain hypertext links embedded in Web sites between all TLDs (top-level domains, such as .ca for Canada) of 29 OECD member countries and 6 gTLDs (generic top-level domains, .com, .net, .int, .gov, .edu and .org) were gathered for July 1998 (OECD 1998). These countries represented approximately 96% of Internet traffic for July 1998. However, missing from the analysis were non-OECD members including such significant Internet users as Brazil, Israel, India, Singapore, and China. Because no one TLD represented Internet traffic for the USA, .edu, .us, and .gov were combined to designate the USA. The other gTLDs, .com, .org, .int, and .net were not included in this grouping because access to these gTLDs was not exclusively American.

The result indicated that .com was the most central node, followed by .net. Also, the USA was the most central country, the nucleus of worldwide Web traffic. Since the Internet was developed in the USA and because of its low telecommunication costs for high-speed bandwidth, it became the traffic hub. At that time, it accounted for 58% of all Internet host, and only 6 of the top 100 Web sites were based outside the country (Cukier 1999). The next most central nations were the UK, Canada, Germany, and Australia. Most peripheral in the network were Iceland and Turkey. The correlation between centrality and GDP was 0.974 (p < 0.000), indicating that a nation’s position in the network was a function of its total wealth. A cluster analysis revealed that the OCED nations and gTLDs formed a single group centered about the .com–.net dyad. There were no sub-groupings due to geography, language, or culture.

The results further revealed that the structure of the Web was related to a number of exogenous variables and preexisting networks, including the international telephone network (r = 0.628, p < 0.000), air traffic network (r = 0.730, p < 0.000), trade network (r = 0.595, p < 0.000), international science citation network (r = 0.486, p < 0.000), international student flows (r = 0.356, p < 0.000), language (r = 0.202, p < 0.002), and asynchrony, defined as the difference in time zones between nations’ capitals (r = 0.113, p = 0.115). Physical distance, however, was not related to the structure of international hyperlinks (r = 0.012, p = 0.416). The cost of communicating via the Internet was unrelated to distance (r = −0.008, p = 0.388). The combined effects of the antecedents indicated that between 62 and 64% of the variance in the structure of hyperlink flows could be accounted for by transportation, telecommunications, science and asynchrony, and either trade or student flows, with transportation being the most significant determinant. These results led the authors to conclude that Internet represents an autopoietic system (Maturana and Varela 1980; Barnett 2005), evolving through self-replication of the telecommunications network, but with growth to accommodate for the physical displacement of the interactants and the ability to rapidly exchange and store vast amounts of information by other than voice (Barnett et al. 2001a).

Along similar lines, Halavais (2000) examined the role of geographic borders of the hyperlink patterns of 4,000 Web sites. He found that Web sites were most likely to link to another site in the same country. When they did link across national borders, most often it was to hosts in the USA. Brunn and Dodge (2001) analyzed inter-domain hyperlinks among 174 geographic TLDs. They treated Web sites’ incoming and outgoing links separately and developed descriptive statistics and cross-tabulation analysis by country and region. Ciolek (2001) examined the direction and volume of hyperlinks among ten East Asian countries and found that while Japan had the greatest volume of hyperlinks, 92% were directed to other Japanese Web sites. Singapore imported 27% of its links and China 25%. Indonesia attracted 30% of all pages with international links from the other countries. Bharat et al. (2001) found that there was a much higher number of intra-national links than ties to other countries. Typically, only 1% of links were to Web sites in another country. When the links among the most central countries were removed, geographical, linguistic, and political factors impacted the structure of the Web.

Barnett and Park (2005) expanded on earlier research by gathering data on the number of bilateral inter-domain hyperlinks among nations using Alta Vista. Included were the TLDs of 47 nations including all OECD member countries (except Poland) and 6 gTLDs. Notable additions to the earlier research included Brazil, India, China, Russia, South Africa, Israel, Singapore, and Indonesia. These data were collected on 30 January 2003. Together, these TLDs represent approximately 98% of Internet traffic (Internet Software Consortium 2001). Again, because no single TLD totally represents the USA, .edu, .mil, .us and .gov were combined to designate the USA (.usa).

The results indicated that the hyperlink network in 2003 was completely interconnected. As in 1998, the USA was the most central country, followed by Australia, UK, China, Japan, Canada, and Germany. Most peripheral in the network were Uruguay, Luxemburg, UAE, Thailand, Slovakia, and Romania. When the direction of link was considered, the USA was the most central in in-degree, followed by Indonesia, India, Italy, and France. On this indicator, Uruguay, UAE and Czech Republic were the most peripheral. Germany was the most central in out-degree, followed by the UK, USA, and Australia. Indonesia, UAE, and India were the most peripheral. A cluster analysis of the hyperlink network revealed a single group centered about the .usa.au dyad, the two most central nodes.

Barnett and Park (2005) also analyzed data on the bilateral bandwidth capacity obtained from TeleGeography (2003). Bandwidth determines how physical network components transport packets of data from point to point as opposed to the TCP/IP for which geography is irrelevant (Townsend 2001). These connections are non-directional. The density of the bandwidth network for the 47 countries that compose the hyperlink network indicated that 18.5% of the possible direct links are present for these countries. The USA was by far the most central country in terms of bandwidth, followed by the UK, Germany, Hong Kong, Singapore, Japan, and France. Most peripheral were Iceland, Lithuania, Morocco, Croatia, and Guatemala.

A cluster analysis resulted in three major groupings: (1) the English speaking countries (USA, UK, Canada, Australia, and New Zealand) with northern Europe (Scandinavia, Belgium and the Netherlands) and eastern Asia, (2) Latin America, and (3) Franco-German Europe (France, Germany, Austria, Italy, Spain, Switzerland, and the Czech Republic). The network resembled a wheel, with the USA at the hub with spokes to the individual countries and clusters of nations. The USA dominated Internet flows due to its position in the network. While there were some links entirely within Europe or the Asian-Pacific region and limited links within Latin America, links between these localities primarily went through the USA. Further, even the connections within specific regions may have been routed through the USA because of limited within-region bandwidth. Clearly, the USA was in position to act as an information broker or gatekeeper in the international Internet.

Townsend’s (2001, p. 1701) examination of the Internet bandwidth resulted in a similar conclusion,

… every region and nearly every country has a direct Internet connection to the United States, direct connections between other countries are less common. Furthermore, direct connections between different major regions such as Asia and Europe are practically nonexistent…This structure dictates that the U.S. Internet infrastructure functions as a massive switching station for traffic that originates and terminates in foreign countries.

Barnett and Park (2005) correlated the hyperlink and the bandwidth networks. It was 0.412 (p = 0.000). Additionally, there was a strong relationship (r = 0.847, p = 0.000) between both networks’ centralities, suggesting that the connectivity pattern between hyperlinks and bandwidth were similar, indicating that the physical infrastructure of the Internet is an important determinant of which countries communicate via this medium.

Park et al. (2010) examined the structure of the international hyperlink network in 2009 and how it changed from 2003. Data were collected in May 2009 using Yahoo. Yahoo acquired the AltaVista in 2004 and has kept the database for its search service. Thus, the search algorithm is the same as for 2003. According to http://www.worldwidewebsize.com/, 2009, Yahoo indexed about 47 billion Web sites at that time. The actual data collection examined over 9.3 billion hyperlinks among 33.8 billion sites from 273 TLDs. Again, three TLDs reserved for the exclusive use of American institutions, .edu, .gov, and .mil were combined with .us to form a node for the USA. Because .com, .org, and .net are not exclusive to the USA, they were not included. This may have resulted in a bias description of the network by underestimating the centrality of the USA and other countries that rely heavily on top-level domains.

The 2009 international hyperlink network was completely interconnected. The USA had the largest in-degree centrality, followed by Germany, UK, France, Japan, and Spain. Germany, UK, Japan, France, and Spain, not the USA, have the highest out-degree centralities. Figure 1 shows the positions of the countries and their links, the connection density among the nodes, and the relative strengths of the hyperlink connections among the countries. These results indicate that in the 2009 hyperlink network, the G7 and several EU countries are central. Also, Brazil and Russia have emerged as core countries integrating more peripheral nations. Brazil links South America and Russia, the former Soviet Republics. Additionally, based on the cluster analysis, it appears that for the first time there are regional, cultural, and linguistic groupings; a Latin American group, cliques, centered about Russia and China, a Scandinavian group, as well as a core group composed of the G7 countries.
https://static-content.springer.com/image/art%3A10.1007%2Fs13278-010-0011-7/MediaObjects/13278_2010_11_Fig1_HTML.gif
Fig. 1

International hyperlink flow network. The size of the concentric circles shows the hyperlink connection density among countries. The thickness of the line connecting two nodes is proportional to the connection density between the two nodes. Only ties with >1,000,000 hyperlinks are shown. All isolates have been removed from the figure. Individual TLDs with gray circles not only represent countries, but also generic TLDs. For example, .TV stands for the island nation of Tuvalu and for television

To investigate the changing global network generated by the World Wide Web, Park et al. (2010) compared the hyperlink relations among 47 countries in 2009 with the same set from 2003. The results for the hyperlink network in 2009 are similar to those reported for 2003 (Barnett and Park 2005). The USA is still the most central country along with Germany, UK, France, Japan, and Spain. The semi-peripheral countries include Netherlands, Austria, Switzerland, Belgium, Australia, Brazil, Mexico, China, India, and Russia. UAE, Israel, Estonia, Uruguay, and Luxembourg are the most peripheral. Various measures of centrality between the two points in time provide further evidence for the stability in the network over time, averaging about 0.80 depending on the measure.

However, the overall correlation between the 2009 and 2003 networks is only 0.406 (p < 0.01) accounting for only about 16% of the variance in the 2009 network by 2003. There were some obvious and interesting changes. First, the international hyperlink network became more highly centralized. The greatest departures from the predicted changes were for the most central countries. Europe as a whole, especially Germany, became much more central. UK, France, Spain, Italy, and Japan’s out-degree centralities grew more than expected. USA, Germany, UK, France, Japan, and Spain’s in-degree grew more than expected. Second, the BRIC (Brazil, Russia, India, and China) countries showed various changes. Brazil grew more than predicted, Russia as predicted, and China has fewer outward links than expected. Perhaps, this is due to internal domestic growth or the use of the Chinese language limiting its contacts with Western Europe. India had fewer inward links than expected. Third, the centralities are distributed as a power curve (Barabasi 2002), suggesting disproportional growth in the number of hyperlinks by the more central countries and support for the notion of preferential attachment. Fourth, while there was only one group in 2003, regional, cultural, and linguistic groupings formed in Latin America and Scandinavia, and around China and Russia, suggesting that hybridization, increased centralization toward core-peripheral countries, and increasing autonomous diversification of semi-peripheral countries took place.

There was increased concentration in the network between 2003 and 2009. It has become centralized around several hubs. The Gini-coefficient for 2009 network shows that international hyperlink network is centralized about several countries that act as the hub (the core groups: the G7 + Spain). The composite Gini-score of 2009 network was 0.466, while it was only 0.291 in 2003.

An important issue that remains unresolved in international hyperlink research is how imperfect spatial information inadvertently alters the perceived structure of the network (Grubesic and Murray 2005). Traditionally, this research has not included gTLDs. As Barnett et al. (2010) point out, there is an inherent bias in the analysis of the international hyperlink network because it does not include gTLDs’ links in the examination of the links among national TLDs. That is, it does not account for the geographic locations of .com. As a result, the connectivity of the USA and other nation states that rely heavily on .com rather than national TLDs are underreported. The reasons for not including the gTLDs are probably due to the ease of data mining the relations among ccTLDs, and the difficulty in determining in which countries these Web sites reside and who uses these sites.

Based on the assumption that decomposing .com leads to a more accurate description of the international hyperlink network, Barnett et al. (2010) investigated adjusting the hyperlink network using data from Alexa.com on the percentage of international Internet users for the most frequently visited .com Web sites. They developed a method to decompose the three gTLDs (.com, .org, and .net) into the countries in which their servers or users reside and distributed the links proportionally to the national nodes. They applied the procedures and compared the results obtained with the traditional methods. This was accomplished as follows.

Alexa.com (http://www.alexa.com/topsites, 2009) lists the 500 top Web sites based on the number of average daily visitors and the number of page views. For each listed site, Alexa.com provides the percentage of global Internet users who visited the site the previous day, the average over the last 7 days, and 1 and 3 months. Also, it provides the percentage of users from all countries representing >0.5% of the site’s traffic. In September 2009, Google.com ranked first with an average of 34% of the world’s Internet users visiting the site daily over the previous 3 months. Further, 37.2% of its visitors were from the USA, 9.3% from India, 3.7% from Brazil, and so on. Google.com was followed by Facebook.com (22.6%) and Yahoo.com (25.8%).

To estimate the link strength between two nations on .com, the percentage of Internet users was first multiplied by the percentage for each country and then summed for a sample of .com sites. Since the percentage of Internet users visiting the various sites is distributed according to the power law, only the most frequently visited sites were sampled. All .com sites with >0.5% of the world’s Internet traffic during the week of 19–26 September 2009 were examined (N = 110). Based on this analysis, an estimated 26.5% (sum of each site users for the USA is 0.694 out of 2.337) of .com traffic involves visitors from the USA, 9.3% from China, 5.8% from Brazil, 4.9% from Japan, 3.3% from UK, and 3.1% from Germany. Third, these numbers were multiplied by the number of hyperlinks to and from .com, which in this data set exceeded 2.1 billion incoming and 4.0 billion outgoing hyperlinks. After the other US domain names (.edu, .gov, .mil, and .us) were subtracted from the total number of incoming links it exceeded 2 billion. Thus, the estimated number of incoming hyperlinks from Web site, other than .com for the USA was over 1.5 billion (2 billion times 0.694). For India, this number was 327 million, and Brazil almost 16 million. Finally, these numbers were added to the reported totals based exclusively on their country’s domain name. This adjustment addressed the systematic bias in international hyperlink analysis. These procedures are summarized in Table 1.
Table 1

Procedures for adjusting international hyperlink strength with .com

1.

Multiply daily percentage of users times percentage for each country and divide by 100

2.

Repeat for all selected Web sites

3.

Sum for all Web sites to determine each county’s total

4.

Sum all countries’ totals. It is >100 (because people use more than one of these sites daily on an average)

5.

Divide each country’s total by the total for all countries. This is the percentage of .com for each country

6.

Multiply this percentage for each country times the row and column for.com. This is the estimate of.com hyperlink use for each country

7.

Add this value to each country’s hyperlink network tie to each other country (and from each other country). This is the adjusted hyperlink data set

Barnett et al. (2010) examined the effects of adjusting the international hyperlink structure by adding the links from the decomposed .com, by comparing the hyperlink relations among 87 countries excluding the .com data with those including the .com links. Both the original and the adjusted networks are displayed in Fig. 2a and b. The adjusted hyperlink network showed significant changes in the centrality of several countries, which make greater use of .com. The US’s out-degree centrality increased and its centrality changed more than any other country when compared to the hyperlink network excluding .com. Also, China, Japan, and India’s centrality notably increased. This is probably due to the strong economic relationships between the USA and these other countries and China’s large portion of .com on specific Chinese language Web sites, such as baidu.com, qq.com and taobao.com. On the contrary, the centrality of countries that do not heavily rely on .com, such as European countries, decreased.
https://static-content.springer.com/image/art%3A10.1007%2Fs13278-010-0011-7/MediaObjects/13278_2010_11_Fig2_HTML.gif
Fig. 2

a International hyperlink structure excluding .com. The size of the concentric circles indicates the hyperlink connection density among countries. The thickness of the line connecting two nodes is proportional to the connection density between the two nodes. Only those ties exhibiting >500,000 hyperlinks are shown. N = 87. b International hyperlink structure including .com. The size of the concentric circles indicates the hyperlink connection density among countries. The thickness of the line connecting two nodes is proportional to the connection density between the two nodes. Only those ties exhibiting >1,500,000 hyperlinks (3 times more compared to hyperlink network excluding .com, based on 3 times degree difference) are shown. N = 87

Correlations between the two sets of centralities scores showed that the addition of .com Web sites did not change the relative network centralities a great deal. The correlations ranged from 0.90 to 0.93 depending on the measure. The cell-wise correlation indicated that there were systematic differences between the two networks (r = 0.755, p = 0.00). The top 20 residuals involved the USA (13), China (5), Japan (4), the UK (2), France (2), Korea (2), Germany (1), Spain (1), Canada (1), and India (1).

Although this research more precisely defined countries as nodes on the Internet through decomposing .com based on where their servers or users reside, there are still several methodological issues that must be addressed. First, these adjustments were not based on the volume of hyperlink connections. They were based on the proportion of Internet users that used certain Web sites and users’ country of residence. The hyperlinks to and from .com were distributed to various countries based on their residents’ Web site use. It was assumed that this is an accurate proxy for the distribution of hyperlink connections for the countries. This might not be the case. Second, no indicators of the reliability of the measurement procedures or the validity of Yahoo’s search engine were provided. Third, there were difficulties in addressing nodes that share their domain names with various US states (e.g., Canada, Germany, and Indonesia) (Barnett and Park 2005). Finally, the employed research procedure can be refined more precisely to define individual countries as nodes on the Internet if other gTLDs such as .net, .org, or .eu Web sites can be decomposed correctly. This research only cracked .com.

3 Social network analyses among political blogs

Social network analysis has been used to investigate the thematic and relational aspects of blogs [or weblogs]. A blog is a web page that features personal journals or focuses on the outside world including such topics as current events (Blood 2000). Blogs can be classified into those with general and specific interest. The former mainly deals with personal thoughts, experiences, and useful information that the bloggers want to share with their visitors. The latter focuses on a particular topic, such as culinary, art, politics, international relations, economics, music, popular culture, or literature. Halavais (2004) stipulates that today many people engage in both abstracting other web pages and generating original content for their blogs. Blogs have become an integral part of the “dynamic” web.

Blogs include two aspects of online interactions, contents, and relationships. They consist of varied contents and linking to other web pages through hyperlinks. Social network analysis has been mainly employed to understand the linking practices and their structure (Park and Jankowski 2008) in a certain theme such as politics, a hosting platform (LiveJournal, Herring et al. 2007; Wallop, Lento et al. 2006), or a certain nation (Korea, Park and Jankowski 2008).

Blog hyperlinks involve diverse agencies in political communication including political party, activist groups, and individuals. Past studies found that the social network revealed by hyperlink connections represented relationships among those agencies and their roles in the network (Park and Jankowski 2008; Norris 2001; Park et al. 2004, 2005; Park and Thelwall 2003; Thelwall 2004; Kim et al. 2010). This section limits its focal point to social network analysis of political blogs, because analyses of blogs with a theme rather than those of general interest efficiently show the efficacy of social network analysis in blog research.

3.1 The nature of political blogs and bloggers

Political bloggers can filter information, proactively seek better information, grasp diverse views, evaluate opinions, and participate in discussions (Blood 2002, 2003). Web feeds (RSS) are good examples of interactivity commonly used by blogs. One can subscribe to a blog by cutting and pasting its RSS address to his/her own blog or blog-reader. Once any part of a blog is updated, the reader will be instantly informed and he/she may reply or comment on the posting (Halavais 2009). The power of blogs lies in transforming both the writers and readers from “audience” to “public” and from “consumer” to “creator” (Blood 2000). Political blogs offer a reciprocal relationship among their users (Halavais 2004).

Kaye (2005) surveyed 3,747 blog readers and found that they were generally young in age, highly educated, and economically well to do. The motivations for blog use included information seeking and checking media facts, convenience of use, personal fulfillment, political and social surveillance, and expression and affiliation. However, one underlying factor throughout the six motivations was political involvement. Although the survey was conducted for blog users in general, it disclosed the trend that blogs were basically a medium for political information seeking and participation (Kaye 2005).

McKenna and Pole (2007) report that the contents of political blogs are composed of information about news articles from the mass media, introductions to other blogs’ postings, and criticism about mass media coverage on political affairs. Content that promotes political activism or ideological issues are very rare. McKenna (2007) also reported that political bloggers construct their blogs from the motivation of voluntarism (“labors of love”) and not for commercial purposes.

3.2 Hyperlinking in political blogosphere

The structure of the political blogosphere reflects a set of relationships among political role takers including citizens, politicians, parties, media or civic organizations (Park et al. 2004), that is, their social network. Adamic and Glance (2005) studied the linking patterns and discussion topics of political bloggers. After examining the posts of 40 “A-list” blogs to study how often political bloggers referred to one another over a 2-month period prior to the US presidential election of 2004, they found that liberals and conservatives link primarily within their separate communities, with far fewer cross-links exchanged between them. Also, they found that the two groups of blogs focused on different news articles, topics, and political figures.

As Adamic and Glance (2005) indicate, the benefit of hyperlink analysis is the finding of the relational patterns in cyberspace. Extant studies demonstrated the utility of structural hyperlink analysis of activist organizations (Adamic 1999; Rogers and Marres 2000; Burris et al. 2000; Tateo 2005; Park et al. 2005; Garrido and Halavais 2003). The activist groups made use of their existence online for anti- or pro-abortion, racism or anti-racism, women, environment, climate change, and political campaign (Biddix and Park 2008).

Political blogs run by individuals are hyperlinked with organizational blogs as well as other individuals. The web of these relationships tends to show an unequal distribution where a small number of blogs occupy a majority of hyperlinks (i.e., power law distribution, Barabasi 2002). This concentration also means that a majority of blogs do not link or are not linked by other blogs (Adar and Adamic 2005; Adar et al. 2004; Wu and Huberman 2004).

In addition, political blogs tend to be clustered along with their interest or affiliation. For instance, social activist blogs tend to link other activist blogs with similar movement themes. Further, they link to the political parties they support or criticize. However, clusters found from blogosphere centered around a small number of key nodes (Herring et al. 2005; Schmidt 2007).

This uneven distribution of links has largely two implications. First, multilateral linking practices constitute a community of nodes (blogs) with similar topics or positions. Central nodes in the hyperlink network facilitate communication among political bloggers in the group. Also, group norms and rules emerge among enthusiastic participants of the community. Schmidt (2007) reports that blog communities are established with informal rules such as “blog etiquette”, including “crediting the source of a link”. He adds that the bloggers who share implicit social norms and meet the group’s expectations are considered members regardless of the existence of the official member registration process for the group.

Second, the nature of blog community produces a concentration of hyperlinks within a few blogs resulting in “cyberbalkanization”, meaning the reinforcement of partisan discourse online. Conservative blogs tend to link other blogs with similar political orientation (Republican Party supporter blogs and the Republican Party) compared to Democratic ones (Adamic and Glance 2005). This phenomenon was also found from liberal bloggers who showed salient tendency to hyperlink Democratic blogs and Web sites (Adamic and Glance 2005; Park et al. 2005).

On the other hand, Hargittai et al. (2005) criticized the balkanization argument. They found that blogs linked other Web sites with dissimilar ideological orientation in their permanent link section. Further, bloggers linked external information sources with dissimilar attitudes in their posts. In a similar vein, hyperlinking does not necessarily involve ideological/attitudinal similarity. Park et al. (2005) report that Korean National Assemblymen’s Web sites have more content-neutral navigational outlinks than self-expressive or party supportive ones. Politicians link sources of informational utility rather than ideological similarity. This finding shows that hyperlinks are sometimes embedded for navigating relevant information sources.

One more noteworthy point is that hyperlinks may be employed to express antipathy toward the Web site targeted. Political resistance sometimes becomes a motivation to link to a Web site. For instance, the ethno-religious conflict between Serbs and Albanians in former Yugoslavia used Web sites, such as http://www.alb-net.com and http://www.srpska-mreza.com, to release their own portrayal of the conflict and animosity (Sunstein 2001; Balkin 2004).

In addition, Lin et al. (2007) argue that hyperlinks among blogs are empirical indicators of relationships between the cities they live in. The frequency of hyperlinks from city A to city B denoted the perceived importance of city B for residents of city A. Both geographical and mental proximity were a predictor of intercity, inter-blog hyperlinks.

In sum, blog hyperlink networks show homophily of attitudes, information sources, and antipathy. Also, they confirm that social network analysis can be used for analyzing linking practices and relationships online among individuals. Although central nodes tend to account for the major portion of the whole network, the fact does not consider the importance of understanding structure of online relationships.

4 Social media and networking research

Developments in information and communication technologies (ICT) have transformed the relationship between individuals and information (e.g., search engines), and the mediation of individuals with each other in a multitude of contexts (e.g., blogs, e-mail). From a social network perspective, these transformations have guided our access to the information and social resources that we use to navigate our social life. Web 2.0 type applications, such as SNS and MUVEs, have combined these new ways of accessing information and relational contacts to create socio-technical networks that are both transactive and portable, and all of these activities are precisely recorded in event log data that can be extracted for network analysis. It is in this sense that emerging ICTs have provided some of richest behavioral and structural communication network data. The following section reviews some of the main streams of investigation into social networks and social media.

Research on CMC has seen an explosive increase in attention since the advent of Internet-based ICTs, as the use of online tools has permeated our social lives. More recently, research attention has shifted toward the use of CMC to support existing relationships, like weblogs and SNS (boyd 2007; Ellison et al. 2007; Kim and Yun 2007; Lackaff et al. 2009; Rosen et al. 2010; Stefanone et al. 2010a). The research on SNS reflects a shift in the way Internet users are afforded more ways to create and actively manage online content, often referred to as Web 2.0 (O’Reilly 2005). The ease and flexibility of managing online content in a social networking sense, when combined with persistent connections to one’s social network, lead to unique ways in which individuals behave online and manage their resources.

Traditionally, media content has been the product of media companies, but new user-created and user-focused online platforms such as wikis, blogs, SNS, and media-sharing sites allow for an increase of individual media ownership, and thus personal investment in media content. Lenhart and Madden (2005), for example, found that over half of Internet-using teens create content in the form of blogs and share photos and videos through a variety of other online services such as Facebook, Flickr, and YouTube.

SNSs such as Facebook and MySpace have emerged as a focal point for content creation and social interaction. Over 98% of college students have SNS profiles (PACS Survey 2007). boyd (2007, 2008) found that SNS users model identity through social network profiles so that they can write themselves and their community into being in networked publics. More specifically, “[a process of] articulated expression supports critical peer-based sociality because, by allowing youth to hang out among their friends and classmates, social network sites (SNSs) are providing teens with a space to work out identity and status, make sense of cultural cues, and negotiate public life” (boyd 2007, p. 2). boyd’s research frequently discussed notions of culture, and how SNSs allow users to both learn and perpetuate cultural norms and cues, but has generally focused on subcultures such as youth or gay culture.

An SNS provides a multifunctional platform for personal online content creation, including photo and video sharing, text messaging, commenting on other users’ content, blogging, and the main functionality, displaying with whom one is “friends” with. This so-called friending allows users to visualize their social network of connections in a photo-based display. SNS friends have access to the content of each other’s personal profile, which is often not visible to non-friends through the use of privacy settings. The profile may contain photos, videos, personal messages “posted” by other friends, and other personal information such as interests and contact information.

Research interest in mediated social networks and SNS use has grown recently, with topics including the study of online social capital from a social support and resource access perspective (Ellison et al. 2007; Lackaff et al. 2009), communicative behaviors from a social-psychological perspective (Stefanone et al. 2010a) and from a cultural perspective (Byrne 2007; Kim and Yun 2007; Rosen et al. 2010), as related to media consumption (Stefanone et al. 2010b), and for a global cooperation network (Roy Lafontaine et al. 2010).

4.1 Social support and social capital

In light of the growing popularity of mediated social networks, mediated social support has emerged as an important research subject. In their discussion of earlier research, Bargh and McKenna (2004) argue that CMC has little direct impact on meaningful social interaction with close friends and family, and that there is no decrease in time spent with strong ties due to Internet use. Rather than substituting for off-line social interaction, they point to evidence that CMC is actually used to help maintain broader social networks (cf. DiMaggio et al. 2001; Howard et al. 2001; Wellman et al. 2001).

Ellison et al. (2007) examined social capital in the context of SNS use. They surveyed college students about their use of Facebook and measured a range of usage behaviors, psychological traits, and social capital, and found a positive correlation between Facebook participation and many forms of social capital, noting that while general Internet use did not predict access to social support (bonding social capital), Facebook use was a significant predictor. They note that this finding warrants the examination of the specific types of online behaviors in the search for explanations of social outcomes.

More recently, Lackaff et al. (2009) examined the ability of SNS users to enact social support. Their findings presented the first results investigating the fundamental issue of the actual value of the “friends” in SNS as related to social capital. They used a two-tier, nested data collection to establish the ability of an individual to get someone in their SNS network to do them a favor. Results showed that the contact frequency, perceived reciprocity, and strength of tie were positively related to the enacted support. Additionally, they found that the personalness of the message was not related to social support. These findings showed that SNS users could enact their online social networks for social support, but it was not proportional to the size of the network, contact, or reciprocity. Also, the strength of ties are important.

4.2 Contingencies of self-worth

SNS behavior has also been investigated from a social-psychological perspective as related to one’s contingency of self-worth (Stefanone et al. 2010b). Contingencies of self-worth afford a more nuanced approach to variable traits, which affect self-esteem and may help explain online behavior. Findings indicated that public-based contingencies explained online photo sharing, while private-based contingencies demonstrated a negative relationship with time online. The appearance contingency for self-worth had the strongest relationship with the intensity of online photo sharing, although no relationship was evident for time spent managing profiles.

4.3 Culture and gender

Rosen et al. (2010) examined the occurrence of culturally and gender influenced differences in online behavior, off-line networks, and satisfaction. Results revealed that participants who identify with more individualistic cultural backgrounds have larger networks of friends on SNSs, have a greater proportion of these friends whom they had not actually met face to face (dubbed promiscuous friending by Stefanone et al. 2008), share more photos online as opposed to participants who identify with less individualistic cultural backgrounds. Social support network size was a significant predictor of satisfaction with life, while SNS network size was not. Findings suggest that participants who identify with more individualistic cultural backgrounds tend to self-promote, are better connected, and more satisfied with their social lives. It seems off-line networks are more important than mediated networks in terms of psychological well-being.

Kim and Yun (2007) found that a Korean SNS reflected many of the collectivistic notions of Korean culture. Specifically, the majority of participants utilized SNS to maintain close relationships with a small number of ties instead of creating new connections with people. The findings are in line with previous constructions of collectivistic culture.

Lenhart and Madden (2007) found that gender differences in language choice are clearly observable on SNS. Older teenage girls (ages 15–17 years) were more likely to use SNS to stay in contact with friends hardly seen face to face and maintain close face-to-face relationships than teenage boys of the same age. Older teenage boys were more likely than girls of the same age to use SNS to flirt and make new friends (Lenhart and Madden 2007). Therefore, in SNS where social and gender context cues are available from posting, participants may spend time reviewing friends’ sites to better understand what is the socially appropriate presentation of themselves (boyd 2008).

4.4 Media consumption

Social cognitive theory suggests a likely relationship between behavior modeled on increasingly popular reality television (RTV) and user behavior on SNS. Stefanone et al. (2010b) surveyed young adults (N = 456) to determine the extent to which RTV consumption explained a range of user behavior in the context of SNSs. Results showed a consistent relationship between RTV consumption and the length of time spent on these sites, the size of user’s networks, the proportion of friends whom they had not actually met face to face, and photo sharing frequency, while controlling for age and gender. Other categories of television viewing were not related to users’ online behavior. Findings suggested a relationship between the consumption of popular traditional mass media content and use of SNSs.

4.5 SNS as global cooperative networks

Roy Lafontaine et al. (2010) investigated engagement activities in an online resource exchange community, CouchSurfing.com, to research elements such as sense of belonging, connectedness, and trust in an SNS coordinated global online community. CouchSurfing.com is an online community where members coordinate travel accommodations with other members, as well as gatherings for cultural exchange. As such, CouchSurfing.com represents an SNS where the exchange of resources presents tangible off-line commitments that have created a global cooperative network. Findings confirm that members who have not met face to face with other members have a lower sense of belonging to the community then those who have. Increased attendance to gatherings was positively related to sense of belonging to the community, and hosting had a positive relationship with trust in the community. Additionally, CouchSurfers reported that they preferred to be contacted through personal e-mails rather then group e-mails, although those who reported increased participation in gatherings found group e-mails to be useful. A unique element of the CouchSurfing research is that the exchanges on the SNS are connected to actual off-line commitments, which are generally absent from all other SNS research where the implications of friending are generally limited to the exposure to information and communicative potential.

5 Social network analysis in virtual environments

Although the graphical quality of multi user virtual environments (MUVEs) has been increasing in quality and application, the interaction within these virtual worlds has remained primarily Internet relay chat (IRC). There has been an increase in the use of Voice over IP (VoIP) in MUVEs, but the larger, community-oriented MUVEs still use mainly IRC. Users generally appear as avatars (visual representation of an individual in-world) in the virtual worlds along with communicative fields, such as a text box where they can post-comments and track the discussion of other users. Text boxes displaying IRC has been a successful tool allowing for communicative interaction. However, IRC poses a difficulty for researchers seeking to analyze and interpret communicative interaction, since data is stored in the form of chat logs that can often be thousands of pages. The current section discusses the methodological procedures that have been developed for the representation and analysis of chat interaction in MUVEs as social networks.

IRC in MUVEs is conducted in a semi-synchronous way, where comments posted appear almost instantly for other users to view and respond to. IRC is a much more real-time mode of computer-mediated communication than listserv messages, bulletin boards, and e-mail. Much like instant messaging (IM), IRC allows users to select a set username that appears before each posted comment allowing multiple users to comment and maintain conversational interaction. Posts to IRC conversations are generally quite short, usually one or two lines, allowing the IRC interaction to be similar to multi-participant face-to-face conversation (Paolillo 1999).

IRC interaction is conducted within a chat box that displays all users’ comments along with their username in a log file. In addition to IRC interaction being semi-synchronous, it is also persistent. Since face-to-face interaction is generally ephemeral, it is very difficult to refer back to previous parts of the conversation for reference, something that is possible via IRC. The persistence of these interactions allow for the storage of all data as chat logs, which can in turn be used for analyses of the users’ interaction. However, the nature of chat logs as a dynamic, non-threaded interaction introduces some methodological hurdles regarding network analysis. Chat sessions are stored as log files containing the raw chat data including metadata such as time-stamps and user ids attached to every comment. The metadata allows for the precise tracking of source and pacing of interaction.

There have been advancements in the analysis of networked interaction in virtual communities in a number of areas. Smith et al. have added a substantial cache of methods and perspectives by studying the interaction and structure of Usenet (Smith 1999; Turner et al. 2006). Usenet is an online bulletin board-type system commonly called newsgroups, although they are not necessarily associated with news, as many “newsgroups” are for recreational, technical, political, and cultural topics. One of the benefits of analyzing Usenet is that the interaction is threaded in tree-like structures, where conversations lead to sub-threads and content can be cross-posted together with newsgroups. This threaded structure provides a clear distinction of who is responding to whom, when the response was posted, and which groups are associated with the interaction. Using this data, Smith et al. have explicated a number of methods for the structural analysis of these online communities via their Netscan project and have written on interaction, participant contribution, and newsgroup hierarchies.

Smith has also investigated the social life of small graphical chat spaces by analyzing Microsoft’s V-Chat systems (Smith et al. 2000). The V-Chat research illustrates the usage patterns of graphical chat systems, illuminating the ways physical proxemics are translated into social interactions in online environments. Krikorian et al. (2000) developed methods to study user proximity in graphical chat rooms and found that various perceived demographics influenced the social “distance” of avatars in the graphical chat environment.

In addition to the structural analysis, there have also been a number of methodological advancements regarding the communicative content of virtual environments. Sack (2000) generated conversation maps of newsgroup postings and described very large conversations by visualizing large amounts of interaction in newsgroups. Suthers et al. (2010) developed a framework for representing and analyzing distributed interaction within MUVEs, including some structural representation of interaction in sequential records of events. However, the methods developed were microanalytic and have yet to be employed on large-scale data. Rosen et al. (2003) explicated a methodology for semantic network analyses of IRC interaction in MUVEs, representing a methodological advancement in the quantitative analysis of the content of IRC interaction. However, there had been little to no development of methods to extract social networks from IRC interaction until Rosen (2010) and Rosen and Corbit (2009) developed network analytic techniques for the measurement and representation of networks in IRC-based MUVEs. Many of these techniques map, display, and study thread-based online communities, such as Usenet groups; whereas graphical chat rooms sequentially log chat interaction, which is difficult to separate and analyze as sub-groups, parsed interaction, or as a structural system.

Even though Rosen et al. (2003) analyzed the content of IRC interaction in MUVWs using semantic network analysis, there still remained a gap in procedures to extract structural social networks from IRC. Many of the parallel online community (e.g., Usenet) and social media (e.g., SNS) research streams have benefited from structural analysis and social network representation, but interaction via IRC is still one of the most common forms of interaction in a variety of contexts (i.e., online gaming, educational environments), yet the structure still remains cloaked behind the form of log file data used to store IRC. Understanding the structure of the interaction provides an in-depth and unique window into MUVWs along several lines. First, network position can be used to identify network roles such as, similar to Turner et al. (2006), identifying roles such as answer person and question person. Second, network analytic techniques can be employed in the subsequent data. Finally, network visualizations can be generated allowing for visual and representational analyses (see Fig. 3) of elements that are traditionally important to community research (Preece and Maloney-Krichmar 2005).
https://static-content.springer.com/image/art%3A10.1007%2Fs13278-010-0011-7/MediaObjects/13278_2010_11_Fig3_HTML.gif
Fig. 3

Network representation of IRC-based interaction in MUVE. The color of the nodes indicates the different types of users (in this example, students nodes are lighter shaded and mentor nodes are darker shaded). The thickness of the line connecting two nodes is proportional to the connection strength between the two nodes. Labels indicate the case-specific roles filled by each individual (specifics not pertinent to this review)

6 Future directions and implications for network science

The science of social networks has progressed in parallel with the use of computer and information systems. Measuring information flows has been one of the main challenges of communication network analysis, and the development of information systems has provided social network scientists with a precise representation of such flows and the ability to advance the state of science. Additionally, the increased theoretical understanding and analytic representation of computer and information systems provided developers with a greater sense of how people and social organizations utilize technology to manage the resources embedded in their social networks. The science of social network analysis and the development of information systems have co-evolved as catalysts of each other’s development and advancement, and the future of both is inexorably bound.

Precise measures of social networks have mirrored the increased use of ICTs. From microblogging networks (e.g., Twitter) to global cooperation networks (e.g., CouchSurfing), the behaviors, material, and non-material exchanges of individuals and larger social systems are recorded, presenting a wealth of data. From a scientific perspective, the implications of having access to so much data of a variety of forms are fourfold. First, we now have access to measurements of social network relations that are more reliable than in the past. Second, the nature of the relational data is unique, allowing for the analytical exploration of network structures in ways that push the boundaries of the science. Third, analysis of network evolution is increasingly possible as much of the data available through ICT is dynamic, a historically persistent challenge when collecting network data. Fourth, dynamic network data has contributed to methods for the animation of social networks.

Reliable data are the foundation of most science, but complete and reliable data are particularly important for the analysis of social networks. It is paramount that researchers are able to capture accurate and complete network data, since network composition and flow can be greatly changed by the removal or inclusion of any particular node. There has been a flood of available accurate, large, complete data from several sources. First, developers and businesses (e.g., Twitter) are making their data sets available to social network scientists, data that contain precise traces of activity among entire populations. Second, researchers are utilizing information technology to capture multiplex flow data from their samples that allow for a multitude of network analytics. The combination of publicly available large-scale data sets with precisely recorded case-study data enables collaboration and validation of data collection, recording, and analysis techniques.

Unique data allow a scientific field to grow its foundation and expand its reach, and the science of networks has seen an explosion of new data forms. From geospatial network data to small-world diffusion networks, the complexity of relations possible in the social and natural world poses many opportunities to network researchers. Information technology has enabled much of the data explosion, but has also provided a parallel benefit in that scientists are better able to collaborate on new methods and distribute their findings (and data) very rapidly. The very distribution of tools and the citation of research have become a valuable form of unique network data.

Dynamic data have been a desire for network scientists since the birth of the field. A majority of social network articles have stated the need for dynamic data, a future direction for, or a drawback to their current research. Indeed, dynamic data are very difficult to collect and analyze, especially for matrix algebraic techniques. Yet, recent developments in methods and available data have enabled a rapid growth in the field of dynamic network analysis (Snijders 2005). Tools such as Siena (Snijders et al. 2005) and SoNIA (McFarland and Bender-deMoll 2007), along with the methods afforded by dynamic network analysis packages, have allowed the state of the science to develop hand in hand with the torrent of over-time data available. Many of the computer-mediated forms of communication have a time stamp embedded in the log files, and allow for the preservation of the dynamic nature of social interaction instead of forcing the compression of the data into a cross-sectional aggregate of data. Visualization techniques have also been rapidly improving, and with dynamic data come animation.

Animation of social networks is the most recent advance in the visual representation of network data. Visual representations of networks have aided in the analysis and elaboration of social networks since the first sociograms. Until recently, most visualizations represented a singular window into the network, being either a representation at a single moment of time or a summation of a specific window of time. Now tools combined with dynamic data allow for a full animation of network evolution (Moody et al. 2005; Elbirt and Barnett 2006). It is in this area that network science is making its biggest advances. As the operationalization of systems theory, network science is increasingly able to elaborate the evolutionary nature of social systems. For example, Elbirt and Barnett (2006) have animated the evolution of a number of different networks including the international telephone network, the Canadian migration network and the US senate revealing unique patterns that are difficult to see in cross-sectional representations. Using a different approach, McCulloh and Carley (2008) use Fourier transforms to enable dynamic animation of on cell phone network data.

The future of network analysis truly lies in the network itself: in the network of researchers developing the analytics that reveal new structures and images, in the network of developers who use these new windows to enable a new frontier of web tools and social affordances, and in the networks that make up our lives. Everything we do in our social lives is inescapably embedded in the larger network of interactions of everyone else and those connections are only cloaked by the ability to sense them. Networks are showing us that we are all part of the same ever changing and evolving system, a dynamic wholeness, and the visual and empirical knowledge of those connections may be the most likely tool to unite us.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Copyright information

© The Author(s) 2010