Keywords

1 Introduction

Twitter, as a cost-effective and time-saving communication channel, plays an important role in emergencies. It becomes a substantial information source for response and recovery [28] and for grassroots management in relief activities [10, 27]. Invaluable efforts have been made to utilize social media for preparedness, relief, and recovery during natural disasters and after they are over [11]. However, the current automatic solutions require human intervention in many stages due to many challenges. Among these challenges is the lack of geographical context that is important for response and relief. For example, during a disaster, responders need to locate the incidents (e.g., road closures, infrastructure damage, etc.) as they happen, tweets, and users discussing the disaster. However, people often tend to hide their geographical information due to privacy and safety concerns [16, 22]. Anderson et al. [12] analysed tweet disaster datasets that span over a period of 6 years and showed only around 2% or less of the tweets are geo-referenced. Thus, developing automatic geolocation tools would enable real-time location-aware monitoring of the disaster which makes the decision-making process more reliable, effective, and efficient.

In my work, I am interested in tackling the Location Mention Prediction (LMP) problem during time-critical situations. The problem involves two tasks that can be tackled separately or jointly: (1) Location Recognition: extracting the location mentions in tweets, and (2) Location Disambiguation: locating potential location mentions on the map. The Disambiguation task includes two identification sub-tasks: (2.1.) identifying the intended location from a set of location mentions sharing the same toponym. (2.2.) identifying the locational focus of a tweet containing different location mentions. Learning to predict the location mentions is a non-trivial task. The location taggers have to address many challenges including microblogging-specific challenges (e.g., tweet sparsity, noisiness, stream rapid-changing, hashtag riding, etc.) and the task-specific challenges (e.g., time-criticality of the solution, scarcity of labeled data, etc.). While tackling these challenges, I aim to address several research questions including: RQ1. Are deep learning approaches more effective compared to the state-of-the-art and traditional machine learning-based LMP approaches?, RQ2. Would context expansion (using user’s tweets, on-topic tweets, etc) improve LMP?, RQ3. How can we reduce the effect of the scarcity of labeled data on the performance of the LMP system?, and RQ4. How can the LMP systems control the trade-off between effectiveness and efficiency during crisis scenarios?.

The remainder of this paper is organized as follows. The related literature is reviewed in Sect. 2. My proposed methodology is discussed in Sect. 3 followed by the evaluation setup in Sect. 4.

2 Related Work

In this section, I discuss the related work to LMP problem over tweets.

TwitterStand [23] is a tweet geo-tagging system for extracting breaking news and pinning them on the map. Lingad et al. [18] compared a few NER tools on disaster-related Twitter data and found their performance noticeably degraded over the Twitter stream. Li et al. [14], on the other hand, constructed their own noisy gazetteer using a crowdsourcing-like method to match extracted location mentions from tweets by the POI tagger. Malmasi et al. [19] extracted noun phrases (NPs) in tweets using a recursive rule-based tree parser and link potential locations with Geonames entries using fuzzy matching. Ghahremanlou et al. [7] explored combined techniques to identify the location mentions by both matching and StafordNER. The major weakness in gazetteer-based methods is the mismatch between the noisy Twitter stream and non-noisy gazetteer entries [17]. To address this issue, Li et al. [15], constructed their own noisy gazetteer using collected cross-posts on Twitter from Foursquare check-ins. Alternatively, Sultanik and Fink [25], used Information Retrieval (IR) based approach to identify the location mentions in tweets. Unlike Ghahremanlou et al. [7], Yin et al. [29] retrained StandfordNER using tweet dataset to effectively identify the location mentions in tweets. More interestingly, to achieve high coverage of recognized locations, a couple of studies [6, 30] adopt an ensemble-based parser.

In 2014, the topic of the fifth Australasian Language Technology Association (ALTA) shared task was on identifying location mentions in tweets [21]. Participants explored several techniques such as feature engineering, ensemble classifiers, rule-based classification, knowledge infusion, CRFs sequence labelers, semi-supervision. Al-Olimat et al. [1] proposed identifying the location names by traversing a tree of the tweet’s n-grams to extract valid locations that exist in their pre-build region-specific gazetteer. Moreover, Hoang and Mothe [8] combined syntactic and semantic features to train traditional ML-based models whereas Kumar and Singh [13] trained a Convolutional Neural Network (CNN) model that learns the continuous representation of tweet text and then identifies the location mentions.

The gab in existing solutions is two-fold. First, in relation to methods, a few studies investigated deep learning-based solutions, most of the proposed solutions are gazetteer-based, and most of them do not consider efficiency when developed. Second, there is not a unified evaluation framework in which a few small-scale datasets are available and different tools are compared. Additionally, the efficiency of the proposed methods is rarely evaluated.

3 Proposed Research

In this section, I describe the proposed solutions to address the research questions listed in Sect. 1.

Deep Location Prediction (RQ1): I perceive the location recognition task as a multi-label classification task. I opt to use the Neural Networks (NNs) algorithms due to their ability to learn features and model parameters simultaneously from incomplete or noisy training data [4]. I specifically plan to experiment with (1) the Bidirectional Long-Term Short-Term Memory [9, 24], (2) Encoder-Decoder with attention [2, 26], and (3) BERT with Fine-tuning [5]. For the disambiguation task, I plan to explore the effectiveness of Siamese Neural Networks [3] that was used recently in neural-based IR models, especially for short text matching [20]. I further plan to experiment with character n-grams to better capture the lexical information.

Context Expansion (RQ2): Due to the short length of tweets, systems lack the context that would enable them to detect the location mentions effectively. To enrich the context of tweet, I plan to explore four tweet expansion sources including (1) User’s tweets: I hypothesize that tweets shared by the user within a time window, say 10 min, are most probably discussing the same topic, (2) On-topic tweets: I assume that tweets sharing the same trending related hashtag to the disaster to be topically relevant, (3) Linked webpages: I anticipate the URLs to be useful sources for enriching the tweet context, and (4) Knowledge-bases (KB): I hypothesize that entity recognition and analysis using external auxiliary data, e.g., knowledge-bases, can aid understanding the spatial focus of a tweet which in turn enables LMP. I plan to use general-purpose KBs and study the effectiveness of knowledge-base population and acceleration techniques to maintain an online up-to-date KBs during the disaster.

Handling Data Scarcity (RQ3): Deep learning algorithms are data-hungry which requires budget and expensive resources for acquiring labeled data. To address this challenge, I investigate possible ways to reduce its effect during disasters such as (1) Exploiting existing data: Using training data from past disasters of similar or a different disaster type to train prediction systems. I plan to leverage one-step domain adaptation techniques (e.g., divergence-based, etc.), (2) Acquiring cheaper data: I plan to explore the effectiveness of expanding the small labeled datasets of a current disaster using semi-supervision, weak supervision and active learning methods, and (3) Reusing pre-trained tools: I plan to study the effectiveness of already-trained tools (based on their availability) on old disasters for effective LMP on new disasters (e.g., transfer learning).

Effectiveness and Efficiency Trade-off (RQ4): As I plan to tackle the recognition task in the disaster domain, I aim to train my models in real-time while the disaster is happening. Thus, I plan to investigate the trade-off between effectiveness and efficiency of LMP systems. Possible paths to study this trade-off are: (1) Tuning system decision by, for example, prioritizing tweets for LMP instead of checking every tweet chronologically, (2) Analyzing time and space complexities, and (3) exploring possible ways to modify the effectiveness measures to account for efficiency.

4 Experimental Evaluation

To evaluate the LMP proposed approaches, precision, recall, and F1 scores will be computed. When a system manages to extract part of the location mention, it is penalized by counting the false positives and false negatives multiplied by the percentage of overlap between the system’s output and the ground-truth. To conduct the initial evaluation, the publicly-available LMP English datasets, that are samples of disaster-specific streams [1, 21, 29], will be used. I anticipate the solutions to generalize to other data domains sharing the same properties with the Twitter stream. The GeonamesFootnote 1 and OpenStreetMapFootnote 2 gazetteers will be utilized for the evaluation of the Disambiguation task. The approaches reviewed in Sect. 2, according to their availability and reproducibility, are the baselines against the proposed approaches for all tasks: (1) Twitter-based location mention detection and disambiguation tool (e.g., LNEx [1]), (2) Academic NER (e.g., StanfordNERFootnote 3, etc.), and (3) Commercial NER taggers (e.g., Google NLFootnote 4, etc.).