Harvesting ambient geospatial information from social media feeds
Social media generated from many individuals is playing a greater role in our daily lives and provides a unique opportunity to gain valuable insight on information flow and social networking within a society. Through data collection and analysis of its content, it supports a greater mapping and understanding of the evolving human landscape. The information disseminated through such media represents a deviation from volunteered geography, in the sense that it is not geographic information per se. Nevertheless, the message often has geographic footprints, for example, in the form of locations from where the tweets originate, or references in their content to geographic entities. We argue that such data conveys ambient geospatial information, capturing for example, people’s references to locations that represent momentary social hotspots. In this paper we address a framework to harvest such ambient geospatial information, and resulting hybrid capabilities to analyze it to support situational awareness as it relates to human activities. We argue that this emergence of ambient geospatial analysis represents a second step in the evolution of geospatial data availability, following on the heels of volunteered geographical information.
KeywordsSocial mediaSocial network analysisVolunteered geographic informationAmbient intelligence
The recent civil unrest events of the Arab Spring, spreading across North Africa and the Middle East in the first months of 2011, confirmed the unprecedented power of social media to communicate information within these societies, and from them to the outside world. This comes only 20 months after the early, experimental use of twitter, facebook, and YouTube in June 2009 to provide real-time accounts of the situation in the streets of Teheran by disseminating images, video, and news (Newsweek 2009), social media were again at the forefront of information transmission. The information disseminated through such media represents a deviation from Goodchild’s (2007a) notion of volunteeredgeography, in the sense that it is not geographic information per se. Unlike Wikimapia or OpenStreetMap, social media feeds do not aim to empower citizens to create a patchwork of geographic information: geography is not their message. Nevertheless, the message has geographic footprints, for example, in the form of locations from where the tweets originate, or references in their content to geographic entities (e.g. the numerous references to Tahrir Square during the Egyptian revolution). Accordingly, we argue that such data conveys ambient geospatial information, capturing for example, people’s references to locations that represent momentary social hotspots. Harvesting this ambient geospatial information provides a unique opportunity to gain valuable insight on information flow and social networking within a society, and may even support a greater mapping and understanding of the human landscape and its evolution over time. In this paper we address a framework to harvest such ambient geospatial information, and resulting hybrid capabilities to analyze it.
This paper addresses the emergence of new analysis techniques, and resulting hybrid capabilities that take advantage of ambient geographical information (AGI) to support situational awareness as it relates to human activities. We argue that this emergence of ambient geospatial analysis represents a second step in the evolution of geospatial data availability, following on the heels of volunteered geographical information (VGI).
The paper is organized as follows. In "Tracing the rise of ambient geospatial information" we trace the rise of ambient geospatial information following the evolution of Web 2.0 technologies and the emergence of social media. In "System architecture for harvesting information from social media feeds" we present a general framework/architecture for collecting ambient information from social media. "Case studies: turning ambient geospatial data into knowledge" presents case studies and novel hybrid types of geospatial analysis that can be performed using ambient geospatial information in a non-technical way along with utilizing social network analysis techniques. In "Discussion and outlook" we offer our outlook assessment.
Tracing the rise of ambient geospatial information
Much of what is now possible with respect to social media feeds relates to the growth and evolution of Web 2.0 technologies. In this section we present the defining characteristics of Web 2.0 and its relation to geospatial information gathering and dissemination. The term Web 2.0 can be traced back to O’Reilly Media in 2004, who used it to define web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web. Utilizing technologies of social networking, social booking marking, blogging, Wikis and RSS/XML feeds (Graham 2007). Web 2.0 can be defined by six often overlapping concepts: (1) individual production and user-generated content, (2) harnessing the power of the crowd (e.g. crowdsourcing, see Howe 2006), (3) data on a massive scale, (4) participation-enabling architectures, (5) ubiquitous networking, and finally, (6) openness and transparency (see O’Reilly 2005; Anderson 2007; Batty et al. 2010; for further discussions). Examples of such Web 2.0 applications include MySpace, facebook, flickr, YouTube, and Wikipedia. The growth of Web 2.0 technologies relies heavily on our ability to communicate and share data and information through simple, freely available tools, in contrast to static websites and data repositories of the past. The aim of Web 2.0 tools are that they can be learnt quickly and effectively without immersion in professional activities (see Hudson-Smith et al. 2009a) such as advanced computer programming skills. Some are describing this change in as the cult of the amateur with respect to information gathering and content sharing (see Keen 2007).
With relation to spatial data, Web 2.0 has led to a renaissance of geographic information (Hudson-Smith and Crooks 2009). This renaissance was fueled by the immense popularity of tools like Google Maps which made place matter, and Google Maps Application Programming Interface (API) which allows practically anyone to create mashups (see Haklay et al. 2008; for more information and discussion), and also by the growth in use of the geobrowsers (e.g. Google Earth, NASA’s World Wind). This renaissance put renewed focus on early work to exploit geographical information present in web pages to support various web queries (e.g. Buyukkokten et al. 1999; Gravano et al. 2003).
Considering the particularities of geospatial content as it relates to the above-mentioned six defining themes of Web 2.0, let us consider individual production and user-generated content, which also results in massive amounts of data. In the past, the production and collection of geospatial data (either primary or secondary) was often the first and most vital task of any geographical information system (GIS) project, with data capture costs often accounting for up to 85% of the cost of a GIS (Longley et al. 2010). This has been tipped on its head through crowdsourcing and VGI. Representative examples include the post-earthquake mapping of Haiti in 2010 (e.g. Norheim-Hagtun and Meier 2010; Zook et al. 2010), the Christmas Bird Count (National Audubon Society 2011) via dedicated services (such as OpenStreetMap or Google Map Maker). OpenStreetMap and Google Map Maker also are perfect examples of participation-enabling architectures that support contributions by both domain experts and amateurs alike. While data collection is still an important aspect, some would argue that harnessing the power of the crowd reduces the burden of data collection. Authors have already started to assess the quality of VGI like OpenStreetMap, by comparing it to established authoritative mapping organizations such as the United Kingdoms Ordnance Survey (Haklay 2010). One could consider VGI ‘good enough’ for its purpose, especially in situations where it presents the only reasonable means to collect information in a timely manner. In addition, some would argue that OpenStreetMap like Wikipedia, is a process of evolving a good product, not a complete product in itself because there is no end goal in sight as to what constitutes the best map (or the best entry in the case of Wikipedia, see Hudson-Smith et al. 2009b). Moreover, the internet is becoming more portable and there has been a considerable rise in location-aware devices like smartphones, GPS-enabled cameras, and tablets for data generation. As such data collection is dependent on increased access to the Internet, digital divide issues (the ‘haves’ and ‘have not’) still remain important, leading for example, to variations in developed versus less-developed countries access to information therefore contribution (see: Longley et al. 2006; Buys et al. 2009). The transition from desktop to a shared and distributed paradigm for data access and contribution allows greater openness and transparency, from top-down government-led efforts such as http://www.whitehouse.gov/open to more bottom-up initiatives such as http://geocommons.com/, both being key components of the emerging Geospatial Web (see Elwood 2010).
All these developments have supported over the past few years the emergence and growth of volunteeredgeography, with citizens as sensors, actively collecting and contributing geospatial information (Goodchild 2007b) utilizing Web 2.0 technologies and advances and reduction in terms of cost of data collection mechanisms (such as GPS enabled devices). Just as Web 2.0 has changed how we interact and share information on the Web, so too has web mapping evolved during the last decade, from viewing static data (such as MapQuest) to more dynamic sites with user generated content. This can be correlated with the emergence of relevant sites having well-defined APIs (such as Google’s My Maps API). Coinciding with sites having APIs are technologies allowing for distributed GIS data collection, from smartphones with GPS to sites with digitization features such as Google Map Maker. Such sites allow people to collect and disseminate geospatial information while bypassing traditional GIS software. Furthermore, through APIs users can create bespoke applications to serve such data, e.g. through web mashups, which have seen substantial increases since 2005, as Google allowed users to access its Google Maps API. However, the analysis capabilities of such tools are often limited. One could consider this to be a legacy of GIS education, in the sense that people often consider GIS as just maps and map displaying, and not the underlying techniques to build such maps. But it also revolves around the purpose of many map mashups: to display data and not to manipulate it. Another barrier to carrying out spatial analysis is of course access to dedicated geographical information software (such as ArcGIS or MapInfo), which was traditionally limited to experts rather than the public at large. This latter concern is however changing through the development of opensource geographical information-related software (such as QGIS, and R), enabling people to manipulate and analyze data, just as OpenOffice allows people to use word and data packages.
mapping the manner in which ideas and information propagate in a society, information that can be used to identify appropriate strategies for information dissemination during a crisis situation;
mapping people’s opinions and reaction on specific topics and current events, thus improving our ability to collect precise cultural, political, economic and health data, and to do so at near real-time rates; and
identifying emerging socio-cultural hotspots.
In the next section we present a general architecture for harvesting ambient geospatial information from social media feeds which could be perceived as a merging of crowd-sourcing, VGI, and social media sourcing.
System architecture for harvesting information from social media feeds
Original social media feeds can be retrieved from the source data provider through queries. This entails submitting a query in the form of an http request and receiving in response data in XML format (e.g. Atom or RSS). The query parameters may be for example, based on location (e.g. specifying an area of interest to which the feed is related), time (e.g. specifying a period of interest), content (e.g. specifying keywords), or even by user handle/ID. In response to these queries, and depending on the characteristics of the information provided by the service, we can receive from the server just metadata or metadata and actual data. A representative example of the first case is flickr, where the query result contains exclusively metadata information (e.g. author, time, and geolocation when available), and information on how to access the actual image itself. A representative example of the second is twitter, where the data received in response to a query are actual tweets and associated metadata (e.g. user information, time of tweet publication, geolocation when available, and information on whether this particular tweet is in response to or retweet of an earlier message).
Once this information is harvested from the social media server it can be parsed to become part of a localdatabase, mirroring the content of the server provider for the specific entries that were returned by our query. Data parsed from diverse sources are integrated by SMI, a Social Media Ingestor, capturing information that is common across diverse sources (e.g. time of submission, user name, originating location, keywords), as well as service-specific information (e.g. content, links to actual files). This allows us to establish an integrated multi-source local database that can be used to perform analysis of the harvested data that is not supported by the provider database interface (e.g. statistics on user activities) for various projects.
Cross-entry links are of particular importance, as they reveal semantic relations between entries. Within a particular service such links may become available from the original user community (e.g. retweets or responses in twitter). Additional links can be detected through analysis of the local database entries (e.g. linking entries that refer to the same real-life event, or ones that have comparable content, or using tags). By performing this analysis in our local database we can identify cross-medium links (e.g. linking twitter entries with YouTube videos and flickr images), which allows us to span the boundaries of source services. Relations between entries are valuable because they reveal relations between the submitters of these entries, allowing us to identify the structure of the underlying social network. The most important relations that form the network are forwarding entries written by other members of the social site, replying to messages, or mentioning of other entries. These are not only very strong indicators of the relations between the submitters, but also indicate information pathways, allowing us to recognize the links through which information is disseminated within different groups, and to identify original sources of the information, and social hubs which disseminate it within their networks.
The framework presented here has been implemented using PostgresSQL as a database. In our prototype we issue http queries to the source social media sites, using their own query API. In our queries we specify certain content parameters (e.g. area of interest). As a response we receive XML files which are parsed to extract content of interest, and subsequently through SMI are inserted in the PostgresSQL database. For practical purposes such queries are non-continuous, but are rather issued periodically, depending on the source traffic and regulations (e.g. issued every 5 min for high traffic feeds, or less frequently for lower traffic). While the information harvested from social media in this manner is not explicitly geospatial, it does include implicit geospatial content, thus rendering it suitable for novel types of geospatial analysis as we present in the following section.
Case studies: turning ambient geospatial data into knowledge
geospatial hotspot emergence, by monitoring variations over time of references to gazetteer entries ("Hotspot emergence"), and
tracing information dissemination routes in an area of interest, and through it identifying and mapping local social networks in an area ("Tracing information dissemination and social networks").
While we use primarily twitter data to demonstrate these capabilities (the massive amounts and rapid updates of contributions make it more interesting for analysis), we can use the same techniques with any other social media feeds from among the ones collected by our prototype system to identify social networks, the locations of their members, and temporal variations of this information.
There has been a growing interest in using social media to track emerging trends, interest over specific events and daily activity patterns. Many of the applications focus on predicting trends such as forecasting box-office revenues for movies using social media (see Asur and Huberman 2010). Only recently have people started to take an interest in the geographical aspects of such trends. Sample demonstrations, such as the recent work of the Centre for Advanced Spatial Analysis at University College London and their new city landscapes or Tweetgeography2 which data mines twitter data on a number of cities from around the world (e.g. London, New York and Paris) and maps the density of tweets to specific areas. While others have identified hotspots of activity such as around train stations or coffee shops with Foursquare data (Wall Street Journal 2011). This is a deviation from traditional social network analysis of such data (for example, who knows whom, who reads whom etc.), and shows the growing interest in location with respect to social media (something to which we will return in "Discussion and outlook").
While in the earlier example we focused on a single location and variations on references to it over time, the analysis can also be performed as a comparative study of multiple locations, as we show in Fig. 4. Here we show the relative traffic amount in social media sites with references to four different locations in Libya on 3/6/11 between 18:00 and 19:00 UTC time (Libya local time is UTC+2) at the height of the local civil unrest. We focus on 4 cities that were major theaters of conflict over that period, with major offensives by the Government attacking the rebels. Figure 4 is actually a snapshot from a video we created showing the variations in social media traffic for these 4 locations over a period of two weeks. For the specific instance depicted in Fig. 4, we show how references to Tripoli lead, followed by Benghazi, Tubruq, and Al-Zawiyah.4 While this instantaneous analysis shows how Tripoli leads at that moment, by comparing the data to earlier traffic patterns we can identify that Al-Zawiyah shows a threefold increase on references compared to its past records, thus identifying it as an emerging hotspot from among the ones compared. As a reference we should mention here that the period between 3/5/11 and 3/8/11 was the peak of the government offensive on Al-Zawiyah, with intense battles raging in the center of the town, and numerous casualties (see the Guardian 2011).
In an effort to assess twitter’s role as a news dissemination mechanism, Kwak et al. (2010) performed a comparison of twitter topics to Google Trends and CNN Headlines and their preliminary results appear to confirm the role of twitter as a news breaking mechanism. In the same broader research direction, Becker et al. (2011) present an approach to distinguish messages that convey real world event information from regular twitter traffic, while Sankaranarayanan et al. (2009) presented an approach to extract news content from noisy twitter feeds guided by a small sample of manually identified seeders which are in essence news-oriented tweeters, and Weng and Lee (2011) presented an approach based on clustering of wavelet-based signals. The breaking of new events (or first story detection) in twitter streams has been addressed by Petrovic et al. (2010) through the use of algorithms based on locality-sensitive hashing, and this can lead to the detection of news sources in twitter groups. Furthermore, the micro-blogging nature of twitter makes it particularly suitable for reporting real-time events, transforming humans to sensors. For example, Sakaki et al. (2010), presented a system to detect earthquakes in Japan through the aggregation of tweets from various users serving as social sensors. They demonstrated how this information can be used successfully for earthquake detection and reporting through the use of Kalman and particle filters.
These data collections offer a unique analysis potential, as they comprise media and annotation content. While analyzing the media portion of the collection has been traditionally addressed in the image analysis community, as a query-and-match problem (finding buildings in a database that look like the one in an image; see fro example, Zhang and Kosecka 2006; Schindler et al. 2007). Taking advantage of the annotation content of these datasets provides new analysis opportunities, ranging from improved geolocation solutions (Firedland et al. 2011) to extracting place semantics (Rattenbury and Naaman 2009), generating representative tags for different locations around the world (Kennedy et al. 2007), and even identifying events and corresponding social media documents (Becker et al. 2010). This potentially leads to improved understanding of such annotated multimedia user-contributed collections.
Tracing information dissemination and social networks
While in the previous section we addressed references to physical locations in order to detect the emergence of geospatial hotspots, the analysis of social media feeds provides us with a unique capability to understand the human landscape in unprecedented temporal resolution and spatial detail. Of particular importance to us is the tracing of the manner in which information is disseminated among various groups, and the formation of social networks.
Coincident with the growth of VGI and crisis mapping (see Meier 2009; Biewald and Janah 2010; Parry 2011) we have seen the growth of data mining techniques to explore events. For example, there has been a number of studies using data mining techniques to trawl through traditional media such as news articles (Brownstein et al. 2008) or Internet search engines (e.g. Polgreen et al. 2008), and blogs (Corley et al. 2010) to explore disease outbreaks. Approaches also carry out geovisual analytics to support crisis management (see MacEachren et al. 2011 and SensePlace25) often through mashups of geotagged information.6 With the advent of micro-blogging the focus has also moved to using twitter messages to forecast influenza rates (Culotta 2010) or swine flu pandemics (Ritterman et al. 2009). However, one can use such information to also look at other trends and, more importantly for the scope of this paper, to gain information via social network analysis7 about the social network structure: who is connected to whom, either directly or via common links, and how persons are clustered in groups sharing common interests. Critical in this analysis is the identification of information dissemination routes, by recognizing major nodes disseminating specific types of information, and their followers. Moreover, if one has locational information pertaining to these data one can geo-locate (i.e. geotag) this information and thus map it out over an area of interest. This can be particularly important in crisis situations, supporting management and response, extending it beyond. Using our prototype system we started collecting twitter data relating to the devastating Sendai (Tohoku) earthquake in Japan (3/11/11), and we present here how this information can be analyzed to collect valuable social network and human landscape information.
While this analysis captures the manner in which information is disseminated within the network, and the different levels of influence of various nodes, it also supports another purpose. Studies support the driving role of homophily in social network activities at large (McPherson et al. 2001; Singla and Richardson 2008) and twitter in particular (Choudhury et al. 2010), with individuals preferring to associate and interact (and thus cluster) with users of similar background and interest. Accordingly, a valid argument can be made that clusters identified through this analysis comprise of individuals who share opinions and are likely to belong to similar classes in the human landscape of the area under investigation. Furthermore, these clusters formed in twitter tend to be spread over larger geographical areas, unlike facebook for example, where studies have shown that friendship relations tend to be geographically clustered and inversely proportional to distance (Backstrom et al. 2010). As a matter of fact Huberman et al. (2009) showed that twitter users interact with small subsets of their social connections, following instead interest- and topic-driven motives in their interaction patterns (Java et al. 2007). Nevertheless, when dealing with a newsworthy situation (extraordinary events) these networks tend to cluster locally. Reports indicate that some degree of correlation appears to exist between the physical location of tweeting individuals relative to the reported event and their network importance (locals gain importance in the network when reporting about a local event), as well as that local networks tend to become denser when addressing local events (Yardi and Boyd 2010).
In the twitter feeds that we collected for this and comparable experiments we have approximately 16% of the feeds with detailed location information (coordinates), while another 45% of the tweets had locational information at coarser granularity (e.g. city level). There is a disparity of reported values regarding the rates of disclosure of geolocation information by twitter users. For example, shortly after twitter’s introduction Java et al. (2007) reported that approximately 52% of twitter users (39 k out of 76 k) in their study had provided some location information in the corresponding entry of their profiles, while more recently Hecht et al. (2011) reported that two out of three users in their study provided some type of geolocation information. More recently Cheng et al. (2010) reported that 5% of users in their study listed locational information at the level of coordinates, with another 21% of users listing locational information at the city level. These variations in the reported percentages of geolocated tweets could be attributed to the fact that precise locational information is more often associated with mobile devices, and thus we have higher percentage of such information available in areas where latest technology is more easily and rapidly adopted. Regardless of the manner in which locational information was obtained, once it is available it can be used to identify the spatial footprint of clusters of social networks like the ones identified in Fig. 8.
The examples we presented above demonstrate newfound capabilities offered by harvesting geospatial information from social media feeds. The collected data can be analyzed to identify clusters of users who share interests and opinions. Further analysis of these clusters of social networks can reveal valuable network information, for example, the main providers of information within them, the manner in which this information is disseminated to large groups of users, and other relevant characteristics. While in our approach we focused on particular geopolitical events (e.g. the Arab Spring events, and the Japan earthquake), the analysis can be performed at any instance and focusing on any topic. In this manner we can collect a variety of parameters describing the composition of crowds, ranging from cultural and political to health and economics. This information can be subsequently analyzed to identify similarities in citizen groups. By geolocating this information we are presented with unprecedented opportunities to harvest human geography data in real-time and at fine spatial resolutions. This information can be valuable to a wide range of operations, ranging from natural disaster response to product market analysis.
Discussion and outlook
The motivation for this paper came from the unprecedented developments in social media and the resulting effects on actively and passively contributed geographic information. That provides us with unique opportunities to collect in real-time data in epic scale and geolocate this information for analysis. Unlike VGI where people are acting as sensors, in ambient geographical information (AGI) they are also the observations from which we can get a better understanding of various parameters of the human landscape. For example, people’s tweets act as sensor measurements in the sense that the Japan data around earthquake shows concern, responses and also captures the way in which events become part of normal life. While the data from Egypt shows potential hot spot emergence. One could consider these as altering notions of how we explore the geographical and social systems. We can observe the collapse of a physical infrastructure as it is affecting its people (e.g. Japan), or the collapse of a social system while leaving the physical infrastructure intact (e.g. Cairo) or not (e.g. Libya). In a sense, such data streams harvested from human sensors have similarities to how one uses rain and stream gauges to monitor flooding in early warning systems.
Unlike VGI, AGI focuses upon passively contributed data and the paper has highlighted a number of applications whereby one can harvest this ambient geospatial data and turn this information into knowledge about what is happening around the world. However, one has to note that there is an issue of only getting a sample of the population when collecting AGI from social media feeds, namely individuals who are active in this arena. Nevertheless this sample is rapidly growing, as relevant technology adoption is becoming more ubiquitous.
This rise in social media and the ability for analysis raises several concerns with respect to the suitability of traditional mapping and GIS solutions to handle this type of information. We no longer map just buildings and infrastructure, but we can now map abstract concepts like the flow of information in a society, contextual information to place and linking both quantitative and qualitative analysis in human geography. In a sense one could consider AGI to be addressing the fact that the human social system is a constantly evolving complex organism where people’s roles and activities are adapting to changing conditions, and affect events in space and time. By moving beyond simple mashups of social media feeds to actual analysis of their content we gain valuable insight into this complex system.
What is to come next is difficult to predict. For example, consider that only 10 years ago the idea of location-based services and GPS embedded into mobile devices was still in its infancy. Advances in tools and software made geographical information gathering easier, resulting into growing trends in crowdsourcing geographical data rather than using authoritative sources (such as National Mapping agencies). More recently, the popularity of geographically tagged social media is facilitating the emergence of location as a commodity that can be used in organizing content, planning activities, and delivering services. We expect this trend to increase as mobile devices become more locationally aware. One could relate this to the growing usage of online activities and services (such as real-time information on social media sites as Foursquare, facebook places, Google Latitude, twitter and Gowalla and a host of new sites and services emerging with time). But also of more static sites (in the sense on can upload when wants) such as flickr, YouTube etc. provides means of viewing and in a sense forming an opinion of a place without actually visiting.
Harvesting ambient information brings forward novel challenges to the issue of privacy, as analysis can reveal information that the contributor did not explicitly communicate (see Friedland and Sommer 2010). But this is not a new trend; it has actually been happening for a while now. Google itself is basically a marketing tool using the information it collects to improve its customer service. Similarly, twitter makes money by licensing their tweet fire hose to search engines, while companies can pay for “promoted tweets”(see Financial Times 2010). And this trend has already spread to locational information. For example, TomTom (2011) has been using passively sensed data for helping police with placing speed cameras. This comes alongside iPhones storing locational data while the user is unaware (BBC 2011). However, people are making progress in highlighting the issue of privacy relinquishing when sharing locational information. Sites and apps such as such as pleaserobme.com or the creepy,9 a geolocation aggregator have demonstrated the potential for aggregating social media to pinpoint user locations. Trying to protect people’s identities in times of unrest is also a well-recognized concern, for example, the Stand by Task Force (2011) suggest ways of limiting expose and delay information for the recent unrest in North Africa.
But the power of harvesting AGI stems from gaining a deeper understanding of groups rather than looking at specific individuals. As the popularity of social media is growing exponentially we are presented with unique opportunities to identify and understand information dissemination mechanisms and patterns of activity in both the geographical and social dimensions, allowing us to optimize responses to specific events, while the identification of hotspot emergence helps us allocate resources to meet forthcoming needs.
While we present a general architecture upon which we based our own system to collect such information, we should note that there also exists a number of comparable tools such as 140kit (http://140kit.com/), or twapperkeeper (http://twapperkeeper.com/), but these are limited in their scalability with respect to large datasets. Sites such as ushahidi (http://www.ushahidi.com/) also provide a means to collect and disseminate information over the web. However, there are very few tools that allow one to add context to content, or to support detailed analysis.
Hashtags represent a bottom up, user-generated convention for adding content (in a sense, metadata) about a specific topic, by identifying keywords to describe content. Thus it allows easy searching of tweets and trends. Sites such as http://hashtags.org/ monitor such trends from tweets and provide relevant statistics, but only over short periods of times.
The tweets were gained by searching using the twitter API within a 30 km radius of the given city or using their twitter profile location.
Examples include: monitoring swine flu (http://compepi.cs.uiowa.edu/~alessio/twitter-monitor-swine-flu/) or twitter traffic and the Oscars (http://www.neoformix.com/2009/OscarTwitterMap.html).
Basically stated social network analysis (SNA) allows us to explore how different parts of a social system (e.g. people, organizations) are linked together. Moreover, it allows one to define the systems’ structure and evolution over time (e.g. kinship or role-based networks). SNA is a quantitative methodology using mathematical graphs to represent people or organizations, where each person is a node, and nodes are connected to others via links (edges). Such links can be directed or undirected (e.g. friendship networks don’t have to be reciprocal).
The 3 most retweeted tweets in this specific dataset in chronological order were: Tweet 1 from NHK_PR: 2011-03-11 13:51:23, asking people to remain calm; the number of retweets in the period of interest was 303. Tweet 2 from NHK_PR: 2011-03-11 13:54:59 was a warning to switch off power to homes before evacuating; the number of retweets in the period of interest was 435. Tweet 3 from NHK_PR: 2011-03-11 14:05:45 was a tsunami warning; the number of retweets in the period of interest was 258.