Harvesting ambient geospatial information from social media feeds

  Published:
Social media generated from many individuals is playing a greater role in our daily lives and provides a unique opportunity to gain valuable insight on information flow and social networking within a society. Through data collection and analysis of its content, it supports a greater mapping and understanding of the evolving human landscape. The information disseminated through such media represents a deviation from volunteered geography, in the sense that it is not geographic information per se. Nevertheless, the message often has geographic footprints, for example, in the form of locations from where the tweets originate, or references in their content to geographic entities. We argue that such data conveys ambient geospatial information, capturing for example, people’s references to locations that represent momentary social hotspots. In this paper we address a framework to harvest such ambient geospatial information, and resulting hybrid capabilities to analyze it to support situational awareness as it relates to human activities. We argue that this emergence of ambient geospatial analysis represents a second step in the evolution of geospatial data availability, following on the heels of volunteered geographical information.

  1. While we present a general architecture upon which we based our own system to collect such information, we should note that there also exists a number of comparable tools such as 140kit (, or twapperkeeper (, but these are limited in their scalability with respect to large datasets. Sites such as ushahidi ( also provide a means to collect and disseminate information over the web. However, there are very few tools that allow one to add context to content, or to support detailed analysis.

  2. Readers are referred to and for further information.

  3. Hashtags represent a bottom up, user-generated convention for adding content (in a sense, metadata) about a specific topic, by identifying keywords to describe content. Thus it allows easy searching of tweets and trends. Sites such as monitor such trends from tweets and provide relevant statistics, but only over short periods of times.

  4. The tweets were gained by searching using the twitter API within a 30 km radius of the given city or using their twitter profile location.


  6. Examples include: monitoring swine flu ( or twitter traffic and the Oscars (

  7. Basically stated social network analysis (SNA) allows us to explore how different parts of a social system (e.g. people, organizations) are linked together. Moreover, it allows one to define the systems’ structure and evolution over time (e.g. kinship or role-based networks). SNA is a quantitative methodology using mathematical graphs to represent people or organizations, where each person is a node, and nodes are connected to others via links (edges). Such links can be directed or undirected (e.g. friendship networks don’t have to be reciprocal).

  8. The 3 most retweeted tweets in this specific dataset in chronological order were: Tweet 1 from NHK_PR: 2011-03-11 13:51:23, asking people to remain calm; the number of retweets in the period of interest was 303. Tweet 2 from NHK_PR: 2011-03-11 13:54:59 was a warning to switch off power to homes before evacuating; the number of retweets in the period of interest was 435. Tweet 3 from NHK_PR: 2011-03-11 14:05:45 was a tsunami warning; the number of retweets in the period of interest was 258.



Stefanidis, A., Crooks, A. & Radzikowski, J. Harvesting ambient geospatial information from social media feeds. GeoJournal 78, 319–338 (2013).

