, Volume 26, Issue 1, pp 71–85

Mobile social group sizes and scaling ratio

Original Article

DOI: 10.1007/s00146-009-0230-5

Cite this article as:
Phithakkitnukoon, S. & Dantu, R. AI & Soc (2011) 26: 71. doi:10.1007/s00146-009-0230-5


Social data mining has become an emerging area of research in information and communication technology fields. The scope of social data mining has expanded significantly in the recent years with the advance of telecommunication technologies and the rapidly increasing accessibility of computing resources and mobile devices. People increasingly engage in and rely on phone communications for both personal and business purposes. Hence, mobile phones become an indispensable part of life for many people. In this article, we perform social data mining on mobile social networking by presenting a simple but efficient method to define social closeness and social grouping, which are then used to identify social sizes and scaling ratio of close to “8”. We conclude that social mobile network is a subset of the face-to-face social network, and both groupings are not necessary the same, hence the scaling ratios are distinct. Mobile social data mining.

1 Introduction

Humans are evolving as fundamentally social creatures. Our belief and behavior have been shaped by our social context. Understanding social context and its structure can help unfold the concealed patterns that assemble our behavior. As our technology advances, we have created different ways of social networking. Besides the conventional face-to-face social networking, we are now interacting with people on online and mobile networks, which inherit some face-to-face social networking fundamentals and also introduce some new elements and concepts. As mobile networks expand rapidly to facilitate the rising number of mobile phone population, the more mobile social services are being developed and offered. Understanding the mobile social network is the first and an essential step toward creating an intelligent functionality that indeed enhances quality of life with a system that comprehends behavior and context of its user(s).

Human social grouping patterns have been studied extensively in both sociology (Coleman 1964) and social anthropology (Kottak 1991; Scupin 1992). Dunbar (1992) proposed that humans had a cognitive limit of about 150 on the number of individuals with whom coherent personal relationships could be maintained. Later, Zhou et al. (2005) identified a social group scaling ratio of “3” as social network members were divided into six groups based on social connectivity with group sizes of about 3–5, 12–20, 30–50, 150, 500, and 1,000–2,000 people.

To the best of our knowledge, no scientific research has been reported in identifying mobile social group sizes and its scaling ratio, thus it is very interesting and important to investigate it for a better understanding of the mobile social network and a useful comparison to the face-to-face social network. The result of the investigation can also be related to behavioral grouping signatures, cognitive process of human brains for social closeness, and mechanisms governing the human grouping dynamics.

The rest of the article is organized as following. Section 2 describes the methods for inferring mobile social closeness and groups, followed by the description of the datasets used in this study as well as the experimental results for validating our approach. Section 3 presents the mobile group sizes based on the actual data and carries out two analytical methods for identifying the mobile social groups’ scaling ratio, where the first method is a simple analysis based on mean group sizes and the second method is a generalized q-analysis using all group clusters. Related work in social closeness and scaling ratio are briefly reviewed in Sect. 4. Section 5 discusses the results of the study and its relation to the face-to-face social network. The limitations of our study are described in Sect. 6. Finally, Sect. 7 concludes the article with a summary and an outlook on future work.

2 Mobile social closeness and grouping

Literatures in social science (Granovetter 1973; Marden and Cambell 1984) discussed the social closeness of people based on amount of time and intensity of communication. Granovetter (1973) found that the time spent in a relationship and the intensity along with the intimacy and reciprocal services formed a set of indicators for social tie. The paper predicted that the strength of an interpersonal tie was a linear combination of amount of time, the emotional intensity, the intimacy (mutual confiding), and the reciprocal services in a relationship. Marden and Cambell (1984) evaluated the indicators and predicators of strength (tie) described by Granovetter (1973) and concluded that “social closeness” or “intensity” provided the best indicator of strength or tie. Marsden et al.’s conclusion was derived based on the survey study of 2,329 human subjects who were drawn from three cross-sectional surveys conducted at two American cities (Detroit, Aurora) and a small city in the Federal Republic of Germany. Responders were asked to identify their three closest friends and report characteristics of these persons such as age, occupation, religion, and so on.

In mobile social network, amount of time and intensity of communication can be measured by call duration (talk time) and call frequency (number of phone calls). In our daily life, we communicate with people in the mobile network at different instances. These people constitute our mobile social network. Based on amount of time and intensity of communication with these people, our mobile social network can be divided into three broad groups:

Group 1: Socially closest members

These are the people with whom we maintain the highest socially connectivity. Most of the calls we receive come from individuals within this category. We receive more calls from them and we tend to talk with them for longer periods. Typically, the face-to-face social tie of these people is family member, friend, and colleagues.

Group 2: Socially near members

People in this group are not as highly connected as family members and friends, but when we connect to them, we talk to them for considerably longer periods. Mostly, we observe intermittent frequency of calls from these people. These people are typically neighbors and distant relatives.

Group 3: Socially distant members

These individuals have less connection with our social life. These people call us with less frequency. We acknowledge them rarely. Among these, it would be, for example, a newsletter group or a private organization with whom we have previously subscribed. This group also includes individuals who have no previous interaction or communication with us. We have the least tolerance for calls from them e.g., strangers, telemarketers, and fund raisers.

We quantitatively define the social closeness between user i to user j from perception of user i (S(i, j)) by Eq. (1)
$$ S(i, j) = \sqrt {(1 - F(i,j))^{2} + (1 - T(i, j))^{2} } , $$
where F(i, j) is the normalized call frequency (normalized to the maximum call frequency among all users with whom user i communicate) between user i and user j, which is given by Eq. (3), and T(i, j) is the normalized call duration or talk time (normalized to the maximum talk time among all users with whom user i communicate) between user i and user j, which is given by Eq. (4). The reason for normalization here is to align all associated users (callers and callees) of the user i onto a reference scale ranging from 0 to 1, where 0 means the minimum and 1 means the maximum. As they are on the same scale, it is thus convenient to compare in a more systematic way.
As F(i, j) and T(i, j) are normalized values that lie between zero and one, thereby S(i, j) has values in the range \( [0,\sqrt 2 ] \), which indicates the mobile social closeness between user i and user j from user i‘s perspective, where 0 implies the closest and \( \sqrt 2 \) implies the farthest relation. Based on this quantity, we can categorize all users associated with the user i into three social groups using a simple algorithm as follows (where Fig. 1 shows graphical representation).
Fig. 1

Graphical representation for identifying boundaries of mobile social groups

Let R denote the Euclidean distance from coordinate (μF, μT) to (1, 1), where μF and μT are the means of F(i, j) and T(i, j), respectively and \( j \in U_{i} \). If \( S(i,j) \le R/2 \), then user j belongs to Group 1, if \( R \ge S(i,j) > R/2 \), then user j belongs to Group 2, and if S(i, j) > R, then user j belongs to Group 3.

As social closeness and social group are defined according to the perception of user i, therefore, using analogy of the circle, user i can be referred to as a Center User, where the distance from the center of the circle (center user) represents the closeness of social relationship to other Associated Users.

Property 1

Social closeness is typically not symmetric but can be symmetric under a specific condition.

Social group is based on social closeness, which is measured by amount of time and intensity of communication between the Center User and Associated User. Social closeness is computed according to the Center User’s perception of each Associated User compared to all other Associated Users. Since different Center Users may have different Associated Users with different amount of time and intensity of communication, thus social closeness is rarely symmetric. For example, user j is perceived by user i as a member of group 1, however, user i is perceived as a member of group 2 of user j since user j has other Associated Users to whom user j communicate more than user i. A mathematical proof is given in the Appendix (Sect. 9.2).

Property 2

Social closeness and social group change over time.

In our daily life, relationships inevitably change over time. Meeting new people with whom the closer relationships established and not keeping in touch with whom the relationships become further are part of our social life. It is inherently true in mobile social network that social closeness changes over time. Situations bring people together and take them apart. These situations can be work, school, hobby, or any event in life. As soon as the phone numbers have been exchanged or given, a new social member may arise and possibly gain closer relationship as time progresses. Thus, social closeness and social group change over time.

2.1 Real-life datasets

In this study, we use two sets of real-life call logs of 30 combined users with nearly 3,000 associated callers/callees and over 46,000 call activities. Note that these users used traditional voice calls as their only means of communication (i.e., no texting or use of internet). Our first dataset consists of 3-month call logs of 20 individual mobile phone users, which were collected at University of North Texas (UNT) during summer of 2006. These 20 individuals were faculty, staff, and students. These call logs were collected as part of the Nuisance Project (Kolan et al. 2008), where Kolan et al. (2008) studied the nuisance level associated with each phone call. Our second dataset consists of 3-month call logs of ten mobile phone users, which were collected during summer of 2008 at UNT. These ten subjects were also faculty, staff, and student. In addition, during our second dataset collecting process, we interviewed the subjects about the social closeness for all of his/her Associated Users by having the subjects identifying the perceived social group for each Associated User. As the result, our second dataset includes additional information on social group corresponding to each Associated User. The details of the data collecting process are described in Phithakkitnukoon et al. (2008). The survey is included in the Appendix (Sect. 9.3).

As part of the data collecting process for both datasets, each user downloaded 3 months of detail telephone call records from his/her online accounts on the mobile phone service provider’s website. Each call record in the dataset had 5-tuple information as follows (an example call record is shown in Fig. 2):
Fig. 2

An example of call record. Note that User IDs have been modified to protect privacy

Call record: {Date, Start time, Type, User ID, Talk time} where
  • Date—date of call

  • Start time—start time of call

  • Type—type of call i.e., “Incoming” or “Outgoing”

  • User ID—caller/callee identifier

  • Talk time—duration of call (in minutes)

2.2 Validation of social grouping

To validate the accuracy of our social closeness/group computation, we use the second set of our data, which contains social group information. We are able identify social groups correctly with the overall accuracy rate of 93.8%. The detailed result is shown in Table 1, which presents the number of correct classification (Hit), the number of incorrect classification (Miss), and the accuracy rate \( \left( {{\frac{\text{Hit}}{{{\text{Hit}} + {\text{Miss}}}}}} \right) \) for each Center User.
Table 1

The result of validation of social group calculation, which includes the number of correct/incorrect classification (Hit/Miss) based on our social closeness calculation and group classification, and the accuracy rate for each user




Accuracy rate (%)





















































Based on the follow-up interviews with these ten subjects, most of the “Miss” are caused by confusion between the face-to-face social closeness and mobile social closeness. For example, one of the subjects identifies his roommate as a group 1 member, but since the subject sees his roommate quite often thus the subject does not make/receive many phone calls to/from him. As a result, his roommate is classified to group 2 based on our calculation (Eq. (1)) but identified as group 1 member by the subject. To avoid the biased feedbacks from the subjects, we do not provide any information about our social closeness computation or much more details about the three social groups than the description provided earlier in this section. Nevertheless, we believe that we have a good result in accuracy rate and, in addition, we do not have a single incorrect classification that misses more than one level of social group.

Furthermore, as stated by Property 2 that social relationships change over time. With our real-life datasets, we thus further experimentally validate Property 2 by showing an example of an actual social group plot of a randomly selected Center User from our datasets in Fig. 3, from which we can see that the Associated User 8 used to be a member in group 1 (Fig. 3a), but as time progresses, he/she has changed calling behavior toward the Center User (or the Center User changes his/her calling behavior toward the Associated User 8) by which furthers relationship apart and leads the user 8 to become a member of group 2 at 30 days later (Fig. 3b).
Fig. 3

a Social relationship at time T and b social relationship at 30 days later (T + 30)

As texting becomes a popular mobile means of communication, one may be curious about how to apply the proposed computational framework to the coexistence of the texting information in the call logs. According to our definition of social closeness (Eq. (1)), the social closeness can be estimated based on the intensity of communication, which can be measured by call duration and call frequency. With texting, the frequency can simply be measured by the number of communications via texting (both incoming and outgoing textings). Even though there is no explicit form of duration of the texting, the length of the texting (number of characters typed) can be the counterpart of call duration. Such that the social closeness computation given in Eq. (1) can be rewritten as
$$ S(i,j) = W_{V} \sqrt {\left( {1 - F_{V} (i,j)} \right)^{2} + \left( {1 - T_{V} (i,j)} \right)^{2} } + W_{T} \sqrt {\left( {1 - F_{T} (i,j)} \right)^{2} + \left( {1 - T_{T} (i,j)} \right)^{2} } , $$
where subscripts V and T represent Voice and Text, respectively. The variables WV and WT are the weights of voice and texting communication means, social closeness is thus estimated based on both mobile means with either same (WV = WT = 0.5) or different contributing load (WV ≠ WT and WV + WT = 1) to the social closeness. These weights can be further studied and determined with the most suitable values.

One may also raise an issue of communication intensity being a subjective judgement. Clearly, it is true. This means that the definition of high and low intensity in communication completely depends on the perception of each individual subject. If we were to acquire the feedback about the communication intensity from two different subjects: Subj. #1 and Subj. #2 given that these two subjects know each other and have been communicating via mobile phones. The feedbacks from both subjects may be different because “the intensity is a subjective judgement” e.g., Subj. #1 may say that he/she has high communication intensity with Subj. #2, but on the other hand, Subj. #2 may think that he/she has low communication intensity with Subj. #1. The perceived intensity levels are different because each subject makes decision on the intensity after comparing the intensity level of the other subject with other associated contacts (other persons with whom the subject has been communicating via mobile phone). Therefore, the perceived intensity is estimated based on the subject’s past communications with all contacts. The subject is thus the center (reference) point of perception. According to our study, the feedback and computation are done based on “one” reference point of view—the Center User who gives the feedback from his/her perception about the communication intensity between him/her and each of his/her Associated Users (callers/callees). We do not compare one subject’s perceived intensity against another, but based on one subject’s perceived intensity (and computed intensity), we classify the social groups. Thereby, it is true that intensity is a subjective judgement, however, it does not after our results as we take this perceived intensity as a ground truth for verifying our computation, not to argue that one subject’s perceived intensity is the same as or different from another’s. We consider each individual subject independently (from the others).

For a possibility of some confusion in feedback-based evaluation of our social grouping scheme and its accuracy calculation, we note the following. The goal of our study is to construct a computational model that estimates the human’s perceived mobile social tie and then uses the verified model to infer other useful characteristics of mobile social group structure. The feedback from the human subjects is the actual perception or the ground truth or the reality that is used to evaluate our model. We further investigate about the group sizes and their successive ratio upon the validated social closeness and social grouping algorithm. To reemphasize on the correct and incorrect classification of our social grouping scheme, we revisit our conducted survey study process and the evaluation process. In our mobile social survey, we recruit ten mobile phone users who are faculty, staff, and students in computer science and engineering department at University of North Texas. We obtain 3-month call logs from each subject who is then asked to identify his/her perceived social tie of each Associated User in the call logs. Information about definition of the social tie is given to the subject as described in Appendix 9.3. In our validation process, for each subject (Center User), we use our social closeness and grouping model to compute the social tie for each Associated User and then compare this computed value against the actual feedback from the subject. The accuracy rate of our model is computed for each subject as the ratio of the number of correct classified Associated Users to the total number of Associated Users. For example, suppose a given subject has five Associated Users and our model computes the social tie (group) as 1, 2, 3, 3, 3 for each Associated User, respectively. Then, we check these computed values against the actual perceived social tie from the subject, suppose the subject’s feedback shows the social tie as 1, 2, 2, 3, 3 for each Associated User, respectively. Such that the accuracy rate can be computed as (Number of corrected social tie base on our model)/(Total number of Associated Users) = 4/5 = 80%.

3 Social group sizes and scaling ratio

Based on the social closeness and group inference in the previous section, it is straightforward to find social group sizes for any given Center User.

In our social world, people who know a lot of people and have many friends are typically socially active. On the other hand, people who are socially less active tend to have smaller social network. It is inherently the case for mobile social network. Since activeness of a phone user (Center User) is related to social group sizes, we define activeness of a Center User by number of outgoing calls per day. Based on this definition, Center Users can be divided into three categories:
  1. 1.

    Low active users Center Users who have less than six outgoing calls per day.

  2. 2.

    Medium active users Center Users who have between six to ten outgoing calls per day.

  3. 3.

    High active users Center Users who have more than ten outgoing calls per day.

Table 2 summarizes the result of social group sizes based on our entire datasets (30 mobile phone users) by listing the mean group sizes for each social group and each category of the Center Users based on the activeness. It can be observed that the mean group sizes have scaling ratio of 8.
Table 2

The mean group sizes of each social group for low, medium, and high socially active Center Users

Social group

Mean group sizes

Low active users

Medium active users

High active users













For low active users, group 1 has mean size of 1.00 (\( S_{1}^{L} = 2^{0} \)), group 2 has mean size of 8.67 ≈ 8 (\( S_{2}^{L} = 2^{3} \)), and group 3 has mean size of 63.33 ≈ 64 (\( S_{3}^{L} = 2^{6} \)). Thus, scaling ratio for low active users is approximately \( {\frac{{S_{i + 1}^{L} }}{{S_{i}^{L} }}} = 2^{3} = 8. \)

For medium active users, group 1 has mean size of \( 1.50 \approx 2^{0.5} \) (\( S_{1}^{M} = 2^{0.5} \)), group 2 has mean size of \( 11.83 \approx 2^{3.5} \) (\( S_{2}^{M} = 2^{3.5} \)), and group 3 has mean size of \( 90.83 \approx 2^{6.5} \) (\( S_{3}^{M} = 2^{6.5} \)). Hence, scaling ratio is medium active users is approximately \( {\frac{{S_{i + 1}^{M} }}{{S_{i}^{M} }}} = 2^{3} = 8. \)

For high active users, group 1 has mean size of 2.00 (\( S_{1}^{H} = 2^{1} \)), group 2 has mean size of 16.91 ≈ 16 (\( S_{2}^{H} = 2^{4} \)), and group 3 has mean size of 126.64 ≈ 128 (\( S_{3}^{H} = 2^{7} \)). Similarly, scaling ratio for high active users is approximately \( {\frac{{S_{i + 1}^{H} }}{{S_{i}^{H} }}} = 2^{3} = 8. \)

From the results of all three categories of Center Users, it is very interesting to see that activeness of Center User indeed reflects the social group sizes, and the same scaling ratio is found for every category and that is
$$ {\frac{{S_{i + 1}^{L} }}{{S_{i}^{L} }}} = {\frac{{S_{i + 1}^{M} }}{{S_{i}^{M} }}} = {\frac{{S_{i + 1}^{H} }}{{S_{i}^{H} }}} = 8. $$
Besides a simple analysis based on the mean group sizes, we further employ a more systematic method of analysis that uses raw group sizes. We thus consider all 90 grouping clusters in our dataset, which are shown in Fig. 4 (in semi-log scale) where the sample distribution can be represented as a sequence of Dirac’s delta functions given by Eq. (9).
Fig. 4

Distribution of group sizes in our dataset

Note that the main idea is to instead of considering the mean group sizes, take into account each individual group size such that the scaling ratio is derived from the raw data. To do so, we need to lay out our raw data and extract the pattern from which the scaling ratio can be obtained. To lay out our data, each group size of all 90 data points (three social groups of 30 Center Users) is plotted onto a simple Center User versus Group Size plot (Fig. 4), which provides us a graphical representation of the distribution of the data. The plot is in semi-log scale, because the clusters of social group sizes appear to be separated by some exponential constant. Thereby, a semi-log scale better represents the data distribution than a linear scale that would have depicted a non-periodic signal or less periodic signal. To extract the pattern from this data distribution plot, we choose to estimate this raw distribution with a Gaussian kernel density estimator such that the data distribution can be transformed to a probability density function (pdf) from which the scaling ratio can be obtained by extracting the periodicity of the signal (pdf).

Figure 5 shows the pdf f(s) estimated by a Gaussian kernel estimator (Wasserman 2005) with zero mean, unit variance, and AMISE optimal bandwidth selection using the Sheather Jones Solve-the-equation plug-in method (Sheather and Jones 1991). From Fig. 5, it can be observed that there are three main clusters of the local peaks of f(s) around 2, 10, and 80. These clusters represent the cumulative frequency of raw data distribution in Fig. 4 of three social groups. Even though the peaks seem to spread out, the three clusters can still be observed. With the obtained pdf, the challenge here is to extract a possible periodicity in the ln s variable, which is called “log-periodicity” (Sornette 1998) e.g., if the previous scaling ratio in Eq. (2) is true, then the periodic oscillation of f(s) can be expressed in the variable ln s with the expected mean period of \( ln8 = 2.08 \). We use generalized q-analysis or (H, q)-analysis (Zhou et al. 2002), which has been shown to be very efficient for finding periodicity (Zhou et al. 2005). The q-analysis is a natural tool for describing discrete scale invariance (Erzan and Eckmann 1997). The (H, q)-analysis consists in constructing the (H, q)-derivative, which is given by Eq. (10).
Fig. 5

The pdf (f(s)) obtained from Gaussian kernel density estimation of group size s

The (H, q)-derivative has two control parameters; the discrete scale factor q derived to characterize the log-periodic structure, and the exponent H introduced to allow us to detrend f(s) in an adaptive way.

To extract the log-periodicity in f(s), we then use a Lomb periodogram analysis (Press et al. 1996). The Lomb periodogram or Lomb power P(ω) is given by Eq. (11). We test for the statistical significance of possible log-periodic oscillations. For each (H, q) pair, the highest peak P(H, q) and its associated angular log-frequency ω(H, q) in the Lomb periodogram are obtained. The basic criterion used to identify a log-periodic signal is the strength of the Lomb periodogram analysis, i.e., the height of the spectral peaks. Figure 8 presents the Lomb periodograms of the (H, q)-derivative \( D_{q}^{H} f(s) \) for different pairs of (H, q) with −1.0 ≤ H ≤ 1.0 and 0.5 ≤ q ≤ 1.0. The highest Lomb power is found at H = −0.7 and q = 0.62 (shown Fig. 6), where its \( D_{q}^{H} f(s) \) is shown in Fig. 7. The highest peak is at ω = 2.99 with Lomb power of 53.25. The preferred scaling ratio is thus\( \lambda = e^{2\pi /\omega } = 8.17 \approx 8 \), which is consistent with the previous result using mean group sizes.
Fig. 6

The highest peak of Lomb power is found at H = − 0.7 and q = 0.62

Fig. 7

The (H, q)-derivative \( D_{q}^{H}f(s) \) as a function of group size s with H = − 0.7 and q = 0.62

Note for the readers who are not familiar with the generalized q-analysis: The goal of applying the generalized q-analysis here is to extract the (most probable) periodicity of the signal (f(s), shown in Fig. 5) obtained from the raw data of social group sizes (shown in Fig. 4). The criterion used to identify the most probable periodicity is the strength of the Lomb power (given in Eq. (11)). It appears in Fig. 6 that the highest Lomb power is found at H = −0.7 and q = 0.62 where its corresponding angular log-frequency is \( \omega (H = - 0.7,q = 0.62) = 2.99 \) (shown in Fig. 8). Therefore, the log-periodicity is \( ln \, \lambda = 2\pi \omega \to \lambda = e^{2\pi \omega } = e^{2\pi (2.99)} = 8.17 \approx 8 \).
Fig. 8

Lomb power as a function of angular log-frequency \( \omega \) of the (H, q)-derivative \( D_{q}^{H} f(s) \) for different pairs of (H, q), where the red line indicates the average of Lomb power

4 Related work

Closeness in face-to-face social networks has been studied in psychology, from which various definitions (Birtchnell et al. 1997; Jamieson 1998; Popovic et al. 2003), components (Kayser and Himle 1994; Sherman and Thelen 1996), classifications of closeness (Orlofsky et al. 1973; Schaefer and Olson 1981), and social support (Milne 1999) have been defined.

As online social networking is gaining popularity, online social analysis has also been extensively studied, and the results have been reported in several literatures, among which discussed about social closeness in online communities (Nolker et al. 2005; Mesch and Talmud 2006; Zhdanova et al. 2007). To our knowledge, no scientific research has been reported in quantifying closeness in mobile social networks.

Mobile social closeness has been mentioned to be an important component of interaction syntax for mobile social software in Bleecker et al. (2006) but never once defined. A literature that has come close to defining mobile social closeness is Hossain et al. (2007), in which the authors measured the closeness centrality for mobile phone users based on the definition proposed by Freeman (1978).

There have been research studies in social group sizes and scaling ratio in sociology (Coleman 1964), social anthropology (Kottak 1991; Scupin 1992), and psychology (Dunbar 1993; Zhou 2005) in face-to-face social networks but not in mobile social networks.

5 Discussion

Social networking is a process of initiating, developing, and maintaining the relationships. With the advance in our technology, we are now interacting with people in online and mobile networks besides the conventional face-to-face social networks. Despite the different setups, these three networks share common members e.g., we often have friends with whom we contact in the face-to-face network as well as in online and mobile network. Figure 9 shows a Venn diagram of these social networks that shares common members in the overlapping areas. Since almost all of the mobile members are initiated through the face-to-face networks, the overlapping area between the mobile and face-to-face network is relatively larger than the overlapping area between the online and the face-to-face network where several online social members are people with whom we have never met in person. However, we occasionally communicate with people on the mobile phone with whom we have never met (e.g., job interviews by phone, customer service calls, etc.), which thus results in a small non-overlapping area between the mobile and face-to-face network.
Fig. 9

Venn diagram of three social networks

Mobile social relationships (mostly) are initiated through the face-to-face social networks. Mobile social relationships are developed and maintained with the intensity of communication, which also strengthen the face-to-face relationships. Thus, despite a small non-overlapping area between the mobile and face-to-face social network, mobile social network is (roughly) a subset of the face-to-face social network.

For face-to-face social networks, Zhou et al. (2005) found a scaling ratio close to “3” based on the results of the previous studies of social grouping (Dunbar 1993; Dunbar and Spoors 1995; Hill and Dunbar 2003), which divided social members into six groups with different group sizes as described in Table 3.
Table 3

Face-to-face social grouping



Group size



Support clique (Dunbar and Spoors 1995; Hill and Dunbar 2003)


A group of individuals from whom the subject would ask personal advice or help in times of severe emotional and financial distress


Sympathy group (Dunbar and Spoors 1995; Hill and Dunbar 2003)


A group of individuals with whom the subject has special tie; these individuals are typically contacted at least once a month


Overnight camp or band (Dunbar 1993)


A group of individuals from whom the subject feels a personal allegiance at a given time


Clan or regional group (Dunbar 1993)


A group of individuals with whom the subject maintains a coherent personal relationship


Megaband (Dunbar 1993)


A group of individuals with whom the subject maintains distant relationship


Tribe (Dunbar 1993)


A group of individuals with whom the subject maintains the furthest distant relationship

According to Dunbar’s number (1992) (“150”), which indicates the number of individuals with whom a stable inter-personal relationship can be maintained, we thus restrict our attention to only the face-to-face social group 1 to group 4 for comparison with our findings in this study.

Let Fg and Mg denote the face-to-face social group g and the mobile social group g. Figure 10 shows the group sizes of the face-to-face network comparing to the mobile social network. From this comparison, it is straightforward to see that
Fig. 10

Comparison of group sizes between face-to-face and mobile social network

  1. 1.

    \( M_{1} \subset F_{1} \)

  2. 2.

    \( M_{2} \subset F_{1} \cup F_{2} \)

  3. 3.

    \( M_{3} \subset F_{2} \cup F_{3} \cup F_{4} \)


Therefore, we conclude our discussion that the mobile social network is a subset of the face-to-face social network, and both groupings are not necessary the same (but relationships can still be drawn, as shown earlier), hence the scaling ratios are distinct (“3” for face-to-face and “8” for mobile social grouping).

6 Societal context

In this study, mobile social networking has been structured into three discrete hierarchies. Group sizes and their successive ratios are presented. Group sizes vary depending on the activeness of the users, but the ratio is nearly constant (close to eight). The findings of this study can be beneficial to:
  1. 1.

    Mobile phone service providers With the increase in mobile phone user population, mobile phone service providers are competing to offer better services and plans for their existing and especially potential customers. The success of “T-Mobile myFaves” ( (plan that allows the user to make unlimited calling to his/her five favorite people) suggests that the emphasis of the future mobile phone services will be on social context. Thereby, ability to recognize mobile social groups and sizes can indeed enrich the services e.g., personalized plans, per social group rates, active/non-active social plans, etc.

  2. 2.

    Privacy settings Privacy concerns are rising as today’s telecommunication technologies allow people to be connected pervasively (Bhaskar et al. 2007). Mobile phone becomes more than just a voice communication device but camera, book, Internet browser, and so on. With the user unaware, information being shared on a connected network can be sensitive and private. Thus, an ability to recognize the user’s social context can facilitate a context-aware mobile phone (Hakkila and Mantyjarvi 2005) to configure privacy level appropriately.

  3. 3.

    Anomaly detection Mobile phone calls form a communication network. Anomaly detection is to identify abnormal behavior occurring in the network. Anomalies in the network usually mean frauds, congestion, or even terrorism. Social context can be used to enhance anomaly detecting methods such as link-based (Wan et al. 2008) and rule-based (Wong et al. 2002) mechanism.

  4. 4.

    Phone call filtering With a flexibility of comfort and ease of use, mobile telephony is widely preferred mode over other communication modes e.g., e-mail, face-to-face interaction. However, this ease of use in real-time communication brings challenges that are not really pertinent in e-mail communications and face-to-face interactions. One problem that mobile users experience is spam and unwanted calls (Kolan and Dantu 2007). These spam and unwanted calls can be mitigated by using social grouping scheme to develop a protocol to allow/block phone calls based on social context (Dantu and Kolan 2005; Kolan et al. 2008).

  5. 5.

    Epidemiology Today’s mobile phones provide convenience by integrating traditional telephony with handheld computing devices. However, the flexibility of running third-party software also leaves the phone open to malicious viruses. In fact, in the past few years, hundreds of mobile phone viruses have emerged and spread through various means such as SMS/MMS, Bluetooth, and traditional IP-based applications (Cheng et al. 2007). Integrating social group scaling ratio into the epidemiologic model can help predict and estimate the spread of virus outbreaks. Recognition of social context can also alleviate vulnerability of becoming infected.

  6. 6.

    Business marketing Acquiring new clients is one of the top priorities in a business. Marketing is a process to communicate to individuals and communities about the existing and new products and services. To increase its effectiveness, social context of the existing clients can be used to guide the direction of the marketing while maintaining the marketing cost efficiency. Social context has shown its positive impact on marketing in previous studies in psychology and marketing research (Smith and Berger 1996; Shang and Croson 2005a, b, 2008).


7 Limitations of the study

Nevertheless, there are some limitations of this study, which can be pointed out as follows:
  1. 1.

    Diversity of the subjects Our subjects were faculty, staff, and students, which present homogeneous subjects. The result would be more generalized with more diverse subjects (e.g., subjects with different backgrounds and life styles).

  2. 2.

    Amount of data for analysis Our analysis is based on the mobile phone’s call logs over the course of 3 months. With the amount of call logs grows to 4, 5, 6 months, and so on, the number of Associated Users also increases as new social relationships are initiated. This limited amount of call logs does not allow us to further study the impact of increase in the new social relationships to the social group sizes and scaling ratio. On the other hand, as stated by Property 2 that social closeness and social tie change over time, it is very interesting to investigate on what indicates the current relationships and with these current relationships, would the group sizes and scaling ratio remain unchanged?

  3. 3.

    Sample size It is difficult to collect these call logs due to the privacy issues and the subject’s unwillingness to participate in the survey due to the time-consuming process. With our 30 mobile users, they might not completely represent the actual mobile social networks, but we believe that it is the first step toward further analysis in this research study. Nonetheless, we will continue to collect more datasets for our future studies.

  4. 4.

    Characteristic of Associated Users The call duration may depend on the characteristic of the Associated Users (callers or callees) e.g., discursive, talkative, or cryptic. Even though the social closeness computation is from the Center User’s perspective that means the duration of each call is influenced by the willingness of the Center User, there are also call durations that are quite extensive and exceeding the Center User’s willingness. These calls are typical as we all may have experienced in our daily lives. Undoubtedly, we believe that such calls exist in our dataset. We also believe that the amount of these calls are relatively small because the Center User usually learns from one or few of these calls and would try to avoid the similar situation (spending undesired long period of time on the phone talking with (listening to) the persons). We are aware of these calls, and we take them as noises or outliers in our dataset. Since the characteristic of the Associated Users is not included for analysis in our survey study, it thus notes another limitation of our study that we would like to address in the future work.

  5. 5.

    Diversity of situaional contexts Sine our data are drawn from typical normal mobile phone users (ordinary lives under no significant political situations or any extraordinary circumstances e.g., natural disasters, big social/economical influences, etc.), thereby the impact of these extraordinary circumstances is not evidenced and analyzed in our present study. For example, suppose there is a subject who is under an unusual situation such as political movements. He could be communicating with several new people (out of his usual daily life style). Intuitively, he forms new relationships. According to our model, he establishes social ties. These new established ties would become his current social ties within his current situational context. Therefore, instead of what it should have been in his normal situational context of 3, 24, and 192 as number of members in his group 1, 2, and 3, respectively, it may be 6, 24, and 96 or else in this current abnormal situational context. We speculate that with a wide range of diverse situational contexts, our model can still be applied. However, the result in group sizes and scaling ratio might be different. The grouping scheme could also be diffident as well. This is a very interesting issue to be further investigated in our future work.


8 Conclusions and future work

With the rapidly growing population of mobile device users, more new mobile social services are being offered. Research and development in mobile social computing are thus intensified. In this article, we present a simple but efficient method to quantitatively define mobile social closeness, which is then used to categorize mobile social network into three groups. Our social grouping approach has been validated with the real-life datasets with a high accuracy rate. With our mobile social grouping results, we identify a group sizes’ scaling ratio of close to “8” based on two different analyses, where one is based on mean group sizes and the other is based on all raw group clusters. We carry out a discussion on social networks. We point out the overlapping area (common members) between face-to-face, online, and mobile social networks and draw a conclusion based on our findings that the mobile social network is a subset of the face-to-face social network, where both have distinct groupings and constant group sizes’ scaling ratios. In societal context, we show that our findings can be beneficial to mobile phone service providers, privacy setting, anomaly detection, phone call filtering, epidemiology, and business marketing.

Nevertheless, there are limitations in our study. Diversity of the study subject’s background is one the limitations since all of our subjects are faculty, staff, and students in the department of computer science and engineering. Their similar backgrounds thus bound the generalization of our results. The amount of data for analysis is also crucial as described in Property 2, thereby larger data (longer call logs e.g., 6 months, 1 year, etc.) would allow us to study the impact of the increase in the new social relationships to the social group sizes and scaling ratio. The characteristic of the Associated Users (callers/callees) also plays an important role in social closeness computation as it does influence some call durations. Without a survey study of the characteristic of the Associated Users, our findings are lacking in this aspect. The privacy issue is the biggest obstacle in our survey study. We find that it is very difficult to find a subject who is willing to share his/her call logs and provide a feedback about social relationships. Our study is, therefore, limited to 30 subjects who might not completely be a representive of the entire mobile social networks but we believe that they are a part of the first step toward further analysis in this research study.

As our future direction, we will continue to investigate on the correlation between the mobile social group sizes and scaling ratio, and the increase in initiation of new social relationship as time progresses. We will investigate on what and how to characterize the current social relationships, in which we believe to play an important role in identifying the underlying correlation.


This work is supported by the National Science Foundation under grants CNS-0627754, CNS-0619871 and CNS-0551694.

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of North TexasDentonUSA

Personalised recommendations