Twitter Geolocation Prediction Using Neural Networks

Open Access
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10713)

Abstract

Knowing the location of a user is important for several use cases, such as location specific recommendations, demographic analysis, or monitoring of disaster outbreaks. We present a bottom up study on the impact of text- and metadata-derived contextual features for Twitter geolocation prediction. The final model incorporates individual types of tweet information and achieves state-of-the-art performance on a publicly available test set. The source code of our implementation, together with pretrained models, is freely available at https://github.com/Erechtheus/geolocation.

1 Introduction

Data from social media platforms is an attractive real-time resource for data analysts. It can be used for a wide range of use cases, such as monitoring of fire- (Paul et al. 2014) and flue-outbreaks (Power et al. 2013), provide location-based recommendations (Ye et al. 2010), or is utilized in demographic analyses (Sloan et al. 2013). Although some platforms, such as Twitter, allow users to geolocate posts, Jurgens et al. (2015) reported that less than 3% of all Twitter posts are geotagged. This severely impacts the use of social media data for such location-specific applications.

The location prediction task can be either tackled as a classification problem, or alternatively as a multi-target regression problem. In the former case the goal is to predict city labels for a specific tweet, whereas the latter case predicts latitude and longitude coordinates for a given tweet. Previous studies showed that text in combination with metadata can be used to predict user locations (Han et al. 2014). Liu and Inkpen (2015) presented a system based on stacked denoising auto-encoders (Vincent et al. 2008) for location prediction. State-of-the-art approaches, however, often make use of very specific, non-generalizing features based on web site scraping, IP resolutions, or external resources such as GeoNames. In contrast, we present an approach for geographical location prediction that achieves state-of-the-art results using neural networks trained solely on Twitter text and metadata. It does not require external knowledge sources, and hence generalizes more easily to new domains and languages.

The remainder of this paper is organized as follows: First, we provide an overview of related work for Twitter location prediction. In Sect. 3 we describe the details of our neural network architecture. Results on the test set are shown in Sect. 4. Finally, we conclude the paper with some future directions in Sect. 5.

2 Related Work

For a better comparability of our approach, we focus on the shared task presented at the 2nd Workshop on Noisy User-generated Text (WNUT’16) (Han et al. 2016). The organizers introduced a dataset to evaluate individual approaches for tweet- and user-level location prediction. For tweet-level prediction the goal is to predict the location of one specific message, while for user-level prediction the goal is to predict the user location based on a variable number of user messages. The organizers evaluate team submissions based on accuracy and distance in kilometers. The latter metric allows to account for wrong, but geographically close predictions, for example, when the model predicts Vienna instead of Budapest.

We focus on the five teams who participated in the WNUT shared task. Official team results for tweet- and user-level predictions are shown in Table 1. Unfortunately, only three participants provided systems descriptions, which we will briefly summarize:
Table 1.

Official WNUT’16 tweet- and user-level results ranked by tweet median error distance (in kilometers). Individual best results for all three criteria are highlighted in bold face.

Submission

Tweet

User

Acc

Median

Mean

Acc

Median

Mean

FujiXerox.2

0.409

69.5

1,792.5

0.476

16.1

1,122.3

csiro.1

0.436

74.7

2,538.2

0.526

21.7

1,928.8

FujiXerox.1

0.381

92.0

1,895.4

0.464

21.0

963.8

csiro.2

0.422

183.7

2,976.7

0.520

23.1

2,071.5

csiro.3

0.420

226.3

3,051.3

0.501

30.6

2,242.4

Drexel.3

0.298

445.8

3,428.2

0.352

262.7

3,124.4

aist.1

0.078

3,092.7

4,702.4

0.098

1,711.1

4,002.4

cogeo.1

0.146

3,424.6

5,338.9

0.225

630.2

2,860.2

Drexel.2

0.082

4,911.2

6,144.3

0.079

4,000.2

6,161.4

Drexel.1

0.085

5,848.3

6,175.3

0.080

5,714.9

6,053.3

Team FujiXerox (Miura et al. 2016) built a neural network using text, user declared locations, timezone values, and user self-descriptions. For feature preprocessing the authors build several mapping services using external resources, such as GeoNames and time zone boundaries. Finally, they train a neural network using the fastText n-gram model (Joulin et al. 2016) on post text, user location, user description, and user timezone.

Team csiro (Jayasinghe et al. 2016) used an ensemble learning method built on several information resources. First, the authors use post texts, user location text, user time zone information, messenger source (e.g., Android or iPhone) and reverse country lookups for URL mentions to build a list of candidate cities contained in GeoNames. Furthermore, they scraped specific URL mentions and screened the website metadata for geographic coordinates. Second, a relationship network is built from tweets mentioning another user. Third, posts are used to find similar texts in the training data to calculate a class-label probability for the most similar tweets. Fourth, text is classified using the geotagging tool pigeo (Rahimi et al. 2016). The output of individual stages is then used in an ensemble learner.

Team cogeo (Chi et al. 2016) employ multinomial naïve Bayes and focus on the use of textual features (i.e., location indicative words, GeoNames gazetteers, user mentions, and hashtags).

3 Methods

We used the WNUT’16 shared task data consisting of 12,827,165 tweet IDs, which have been assigned to a metropolitan city center from the GeoNames database1, using the strategy described in Han et al. (2012). As Twitter does not allow to share individual tweets, posts need to be retrieved using the Twitter API, of which we were able to retrieve 9,127,900 (71.2%). The remaining tweets are no longer available, usually because users deleted these messages. In comparison, the winner of the WNUT’16 task (Miura et al. 2016) reported that they were able to successfully retrieve 9,472,450 (73.8%) tweets. The overall training data consists of 3,362 individual class labels (i.e., city names). In our dataset we only observed 3,315 different classes.

For text preprocessing, we use a simple whitespace tokenizer with lower casing, without any domain specific processing, such as unicode normalization (Davis et al. 2001) or any lexical text normalization (see for instance Han and Baldwin (2011)). The text of tweets, and metadata fields containing texts (user description, user location, user name, timezone) are converted to word embeddings (Mikolov et al. 2013), which are then forwarded to a Long Short-Term Memory (LSTM) unit (Hochreiter and Schmidhuber 1997). In our experiments we randomly initialized embedding vectors. We use batch normalization (Ioffe and Szegedy 2015) for normalizing inputs in order to reduce internal covariate shift. The risk of overfitting by co-adapting units is reduced by implementing dropout (Srivastava et al. 2014) between individual neural network layers. An example architecture for textual data is shown in Fig. 1a. Metadata fields with a finite set of elements (UTC offset, URL–domains, user language, tweet publication time, and application source) are converted to one-hot encodings, which are forwarded to an internal embedding layer, as proposed by Guo and Berkhahn (2016). Again batch normalization and dropout is applied to avoid overfitting. The architecture is shown in Fig. 1b.

Individual models are completed with a dense layer for classification, using a softmax activation function. We use stochastic gradient descent over shuffled mini-batches with Adam (Kingma and Ba 2014) and cross-entropy loss as objective function for classification. The parameters of our model are shown in Table 2.
Fig. 1.

Architectures for city prediction.

Table 2.

Selected parameter settings

Parameter

Property

Parameter

Property

Description embedding dim

100

Text embedding dim

100

Location embedding dim

50

Timezone embedding dim

50

Name embedding dim

100

The WNUT’16 task requires the model to predict class labels and longitude/latitude pairs. To account for this, we predict the mean city longitude/latitude location given the class label. For user-level prediction, we classify all messages individually and predict the city label with the highest probability over all messages.

3.1 Model Combination

The internal representations for all different resources (i.e., text, user-description, user-location, user-name, user-timezone, links, UTC offset, user lang, tweet-time and source) are concatenated to build a final tweet representation. We then evaluate two training strategies: In the first training regime, we train the combined model from scratch. The parameters for all word embeddings, as well as all network layers, are initialized randomly. The parameters of the full model including the softmax layer combining the output of the individual LSTM– and metadata– models are learned jointly. For the second strategy, we first train each model separately, and then keep their parameters fixed while training only the final softmax layer.

4 Results

The individual performance of our different models is shown in Table 3. As simple baseline, we predict the city label most frequently observed in the training data (Jakarta in Indonesia). According to our bottom-up analysis, the user-location metadata is the most productive kind of information for tweet- and user-level location prediction. Using the text alone, we can correctly predict the location for 19.5% of all tweets with a median distance of 2,190 km to the correct location. Aggregation of pretrained models also increases performance for all three evaluation metrics in comparison to training a model from scratch.
Table 3.

Tweet level results ranked by median error distance (in kilometers). Individual best results for all three criteria are highlighted in bold face. Full-scratch refers to a merged model trained from scratch, whereas the weights of the full-fixed model are only retrained where applicable. The baseline predicts the location most frequently observed in the training data (Jakarta).

Model

Tweet

User

Acc

Median

Mean

Acc

Median

Mean

Location

0.361

205.6

4,538.0

0.445

43.9

3,831.7

Text

0.195

2,190.6

4,472.9

0.321

263.8

2,570.9

Description

0.087

3,817.2

6,060.2

0.098

3,296.9

5,880.0

User-name

0.057

3,849.0

5,930.1

0.059

4,140.4

6,107.6

Timezone

0.058

5,268.0

5,530.1

0.061

5,470.5

5,465.5

User-lang

0.061

6,465.1

7,310.2

0.047

8,903.7

8,525.1

Links

0.032

7,601.7

6,980.5

0.045

6,687.4

6,546.8

UTC

0.046

7,698.1

6,849.0

0.051

3,883.4

6,422.6

Source

0.045

8,005.0

7,516.8

0.045

6,926.3

6,923.5

Tweet-time

0.028

8,867.6

8,464.9

0.024

11,720.6

10,363.2

Full-scratch

0.417

59.0

1,616.4

0.513

17.8

1,023.9

Full-fixed

0.430

47.6

1,179.4

0.530

14.9

838.5

Baseline

0.028

11,723.0

10,264.3

0.024

11,771.5

10,584.4

For tweet-level prediction, our best merged model outperforms the best submission (FujiXerox.2) in terms of accuracy, median and mean distance by 2.1% points, 21.9 km, and 613.1 km respectively. The ensemble learning method (csiro) outperforms our best models in terms of accuracy by 0.6% points, but our model performs considerably better on median and mean distance by 27.1 and 1358.8 km respectively. Additionally, the approach of csiro requires several dedicated services, such as GeoNames gazetteers, time zone to GeoName mappings, IP country resolver and customized scrapers for social media websites. The authors describe custom link handling for FourSquare, Swarm, Path, Facebook, and Instagram. On our training data we observed that these websites account for 1,941,079 (87.5%) of all 2,217,267 shared links. It is therefore tempting to speculate that a customized scraper for these websites could further boost our results for location prediction.

As team cogeo uses only the text of a tweet, the results of cogeo.1 are comparable with our text-model. The results show that our text-model outperforms this approach in terms of accuracy, median and mean distance to the gold standard by 4.9% points, 1,234 km, and 866 km respectively.

For user-level prediction, our method performs on a par with the individual best results collected from the three top team submissions (FujiXerox.2, csiro.1, and FujiXerox.1). A notable difference is the mean predicted error distance, where our model outperforms the best model by 125.3 km.

5 Conclusion

We presented our neural network architecture for the prediction of city labels and geo-coordinates for tweets. We focus on the classification task and derive longitude/latitude information from the city label. We evaluated models for individual Twitter (meta)-data in a bottom up fashion and identified highly location indicative fields. The proposed combination of individual models requires no customized text-preprocessing, specific website crawlers, database lookups or IP to country resolution while achieving state-of-the-art performance on a publicly available data set. For better comparability, source code and pretrained models are freely available to the community.

As future work, we plan to incorporate images as another type of metadata for location prediction using the approach presented by Simonyan and Zisserman (2014).

Footnotes

Notes

Acknowledgments

This research was partially supported by the German Federal Ministry of Economics and Energy (BMWi) through the projects SD4M (01MD15007B) and SDW (01MD15010A) and by the German Federal Ministry of Education and Research (BMBF) through the project BBDC (01IS14013E).

References

  1. Chi, L., Lim, K.H., Alam, N., Butler, C.J.: Geolocation prediction in Twitter using location indicative words and textual features. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 227–234, December 2016. http://aclweb.org/anthology/W16-3930
  2. Davis, M., Whistler, K., Dürst, M.: Unicode Normalization Forms. Technical report, Unicode Consortium (2001)Google Scholar
  3. Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. CoRR, abs/1604.06737 (2016)Google Scholar
  4. Han, B., Baldwin, T.: Lexical normalisation of short text messages: makn sens a #Twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, Stroudsburg, PA, USA, vol. 1, pp. 368–378 (2011). http://dl.acm.org/citation.cfm?id=2002472.2002520. ISBN 978-1-932432-87-9
  5. Han, B., Cook, P., Baldwin, T.: Geolocation prediction in social media data by finding location indicative words. In: COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Mumbai, India, pp. 1045–1062, 8–15 December 2012. http://aclweb.org/anthology/C/C12/C12-1064.pdf
  6. Han, B., Cook, P., Baldwin, T.: Text-based Twitter user geolocation prediction. J. Artif. Int. Res. 49(1), 451–500 (2014). http://dl.acm.org/citation.cfm?id=2655713.2655726. ISSN 1076-9757Google Scholar
  7. Han, B., Rahimi, A., Derczynski, L., Baldwin, T.: Twitter geolocation prediction shared task of the 2016 workshop on noisy user-generated text. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 213–217, December 2016. http://aclweb.org/anthology/W16-3928
  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997).  https://doi.org/10.1162/neco.1997.9.8.1735. ISSN 0899–7667CrossRefGoogle Scholar
  9. Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167 (2015). http://arxiv.org/abs/1502.03167
  10. Jayasinghe, G., Jin, B., Mchugh, J., Robinson, B., Wan, S.: CSIRO Data61 at the WNUT Geo Shared Task. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 218–226, December 2016. http://aclweb.org/anthology/W16-3929
  11. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. CoRR, abs/1607.01759 (2016). http://arxiv.org/abs/1607.01759
  12. Jurgens, D., Finethy, T., McCorriston, J., Xu, Y.T., Ruths, D.: Geolocation prediction in Twitter using social networks: a critical analysis and review of current practice. In: ICWSM, pp. 188–197 (2015)Google Scholar
  13. Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
  14. Liu, J., Inkpen, D.: Estimating user location in social media with stacked denoising auto-encoders. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, Colorado, pp. 201–210, June 2015. http://www.aclweb.org/anthology/W15-1527
  15. Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. CoRR, abs/1310.4546 (2013). http://arxiv.org/abs/1310.4546
  16. Miura, Y., Taniguchi, M., Taniguchi, T., Ohkuma, T.: A simple scalable neural networks based model for geolocation prediction in Twitter. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 235–239, December 2016. http://aclweb.org/anthology/W16-3931
  17. Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLOS Currents Outbreaks 6 (2014)Google Scholar
  18. Power, R., Robinson, B., Ratcliffe, D.: Finding fires with Twitter. In: Australasian Language Technology Association Workshop, vol. 80 (2013)Google Scholar
  19. Rahimi, A., Cohn, T., Baldwin, T.: Pigeo: a python geotagging tool. In: Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, pp. 127–132, August 2016. http://anthology.aclweb.org/P16-4022
  20. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
  21. Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, P., Rana, O.: Knowing the tweeters: deriving sociologically relevant demographics from Twitter. Sociol. Res. Online, 18 (3) (2013).  https://doi.org/10.5153/sro.3001. ISSN 1360–7804
  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313. ISSN 1532–4435MathSciNetMATHGoogle Scholar
  23. Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, New York, NY, USA, pp. 1096–1103. ACM (2008).  https://doi.org/10.1145/1390156.1390294. ISBN 978-1-60558-205-4
  24. Ye, M., Yin, P., Lee, W.-C.: Location recommendation for location-based social networks. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2010, pp. 458–461, New York, NY, USA. ACM (2010).  https://doi.org/10.1145/1869790.1869861. ISBN 978-1-4503-0428-3

Copyright information

© The Author(s) 2018

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Language Technology LabDFKI GmbHBerlinGermany

Personalised recommendations