1 Introduction

We live in a world of data. Nowadays, there are 6.8 billion of mobile phone subscribers worldwide, with millions of new subscribers every day [1]. More importantly, the almost universal adoption of mobile phones and the exponential increase in the use of social media and other Internet services is generating an enormous amount of data about human behaviors with a breadth and depth that was previously inconceivable. As recently reviewed by Blondel et al. [2], the Call Detail Records (CDRs), needed by the mobile phone operators for billing purposes, can be exploited to extract mobility patterns [35], to model social interactions [6, 7], city’s structures [8], and epidemic spreading [9, 10], to estimate population densities [11], and to predict socio-economic indicators and outcomes of territories [12, 13]. Similarly, the emergence of social media (e.g. Twitter, Foursquare, Facebook) provides further opportunities to researchers to study different aspects of human behavior such as people’s mobility [14] and social well-being of individuals and communities [15].

In this context, research challenges that provide access to a large number of research teams to the same dataset are becoming a valuable framework to advance the state of the art in the field and to sustain the process of reproducibility needed by the scientific community. An example is the Orange’s ‘Data for Development’ (D4D) initiative [16, 17]. Last year, Telecom Italia with support from MIT Media Lab, Northeastern University, Fondazione Bruno Kessler, Polytechnic University of Milan, University of Trento, EIT ICT Labs, Trento Rise, and Spazio Dati organized the ‘Telecom Italia Big Data Challenge’ [18], providing a multi-source geo-referenced and anonymized dataset composed by telecommunications, weather, news, Twitter and electricity data from two Italian areas: the city of Milan and the Trentino province [19].

More than 650 teams from more than 100 universities have participated to the ‘Telecom Italia Big Data Challenge’. The projects ranged from predicting energy consumption to exploring the impact on mobility of some specific events and comparing mobile phone calling patterns with economic, demographic, and well-being indicators.

The goal of the present thematic series is to showcase some of the most outstanding contributions submitted to the ‘Telecom Italia Big Data Challenge 2014’ and to provide a discussion venue about recent advances in the application of CDRs and social media data to the study of individual and collective behaviors, with a particular attention devoted to the city dynamics.

2 Contributions

The first contribution, by De Domenico et al. [20], investigates route assignments in smart multimodal systems [21, 22], where individual daily trips follow recommendations based on personal and community constraints. The proposed approach is of special interest for designing efficient cities, where inhabitants could be automatically routed in order to reduce traffic and pollution. A person might want to avoid routes with high traffic or areas with high criminality, or to favorite routes across shopping and touristic areas. However, the individual choices of certain routes, without accounting for the state of the whole urban system, may lead to traffic congestion, increasing pollution, etc. [23]. In their paper, the authors proposed to model the trips in an urban system as interacting particles with data-driven origin-destination pairs. The route choices of the interacting particles are based on a time-varying potential energy landscape that seeks to simultaneously satisfy individual’s (e.g. avoiding specific areas of the city) and community’s (e.g. traffic and pollution reduction in specific city areas) constraints. Specifically, the proposed framework integrates multiple layers of constraints to favor certain routes and to study the effects of the proposed recommendations. The obtained results showed that the synergy among the individual choices plays a fundamental role in designing an efficient and smart city: only when all the individuals move according to the recommended routes, the city traffic is closer to the most ideal mobility scenario. Interestingly, the proposed method allows to monitor the traffic state of the city in real time, automatically identifying areas that are experiencing a congestion and hence supporting urban authorities and policy makers in planning interventions.

The second paper, contributed by Douglass et al. [24], used telecommunications activity data to create high-resolution population estimates. The traditional local census estimates are expensive, contingent on participation, and often suffer from several logistical issues. As shown by [11], telecommunications data are a promising new source of real-time estimates of population. In their paper, Douglass et al. have shown that the correlation between call volume and population in a given area of Milan is scale invariant above a certain population size. Then, the authors by means of a Random Forest regression [25] provided a reliable estimate of population for populous areas. The obtained results suggest that the method could be extended also to estimate population in less dense areas and to create estimates by gender, age, and ethnicity. Finally, the authors evaluated models for predicting the percentage of foreign population.

In the third paper, Bajardi et al. [26] studied urban spaces through the analysis of mobile phone records of users with strong international links, e.g. migrants and visitors travelling to a city for tourism or for business. More precisely, the authors focused on mobile phone records collected in Milan and used an entropy function to measure the level of country codes’ hetereogenity in the calling patterns of a city’s neighborhood. Then, they proposed a topological classification based on persistent homology and clustered the nationalities associated to the calls’ sources and destinations outside Italy into two main groups. The first group comprises low-income countries, whose topological spatial patterns show a strong cyclic spatial distribution. The second group is formed by high-income countries, whose spatial distribution is scattered in small areas over the city. These results indicate that migrant communities from low income countries tend to aggregate in cohesive spatial structures and to live in the city’s residential areas, mainly around the city centre; while communities associated with higher income countries tend to represent movement patterns of tourists and/or highly specialized professionals in central and high-entropy urban areas. As pointed out by the authors, the findings are in line with the ones predicted by the spatial assimilation theory [27] and confirm the empirical observation that different socio-economic migrant conditions can show distinct spatial clustering patterns [28]. Moreover, the authors demonstrated how mobile phone data can provide very specific spatial and temporal trajectories of visitors from a given country during a mass gathering event (e.g. large sport events).

The fourth and last contribution, by Alshamsi et al. [29] focuses on the relationship between urban communication and urban happiness. Specifically, the authors analyzed geo-located tweets within Milan to produce a detailed spatial map of urban sentiments. Then, they used communication intensity data to build the directional network of urban areas where the weights of the edges represent the communication strength between the areas. Their results found that there is no correlation between the happiness level of urban areas and the amount of communication the areas receive or initiate. Instead, happy urban areas tend to interact with other happy areas more than they interact with unhappy areas and, similarly, unhappy areas tend to interact with other unhappy areas more than they interact with happy areas. Interestingly, the urban happiness homophily supports previous findings on individual happiness homophily [30]. The obtained results may be relevant to guide policy makers in setting strategies that increase urban happiness.

3 Conclusion

The fourth papers in this series are excellent demonstrations of how mobile phone and social media data can contribute to many discoveries on daily life of individuals, communities and cities.

Telecom Italia is currently running a second edition of the Challenge [31]. This year, the data are released on 7 Italian cities: Bari, Milan, Naples, Rome, Turin, Venice and Palermo. Datasets include CDRs, demographic data from Telecom Italia (e.g. gender, age-range and living area), Twitter data, energy consumption data, private mobility data (trips performed by customers of some car security and insurance companies), and detailed Italian companies’ information (e.g employees, size and locations). Hence, there are good reasons to continue with a second edition of this thematic series as follow up of the Big Data Challenge 2015.