Advertisement

GeoInformatica

, Volume 22, Issue 3, pp 563–587 | Cite as

Strategies for combining Twitter users geo-location methods

  • Silvio RibeiroJr
  • Gisele L. PappaEmail author
Article

Abstract

Twitter has become a major player in the social media scene with over half billion users and over 500 million tweets published daily. With this abundant data, researchers saw the opportunity to explore this data for monitoring events and tracking epidemics. In this type of application, knowing the location of the user is essential. However, most of the information about location self-reported by users is difficult to process, and barely 1% of all published tweets are geolocated. Hence, user location inference is often performed by analyzing public available information from the user profile and his tweets. In this work, we evaluate and compare 16 approaches for user location inference based on different information sources that include interaction networks and text from tweets. We show that methods working with the user friendship network obtain higher values of accuracy and recall when compared to the other methods. From these results, we verify the agreement of pairs of methods regarding the predicted location and the users they cover. We find out that most methods disagree in their inferences while covering different sets of users. These results open up an opportunity to combine different methods in order to improve location accuracy and user recall. We propose four methods for combining the outputs of the evaluated methods. Two of them, one based on a weighting vote scheme (GAVe) and another based on a meta decision tree cover at least 98% of the users in the dataset, while location 75% of them within a distance of 100 km from their real location.

Keywords

Location inference Twitter Social networks Geoinference Methods combination 

Notes

Acknowledgments

This work was partially funded by CAPES, CNPq and FAPEMIG, all Brazilian Research Agencies. The authors would like to thank David Jurgens for providing the source codes for the four network-based methods.

References

  1. 1.
    Abrol S, Khan L (2010) Tweethood: Agglomerative clustering on fuzzy k-closest friends with variable depth for location mining 2nd Int. Conf. on Social Computing (SocialCom), pp 153–160Google Scholar
  2. 2.
    Aramaki E, Maskawa S, Morita M (2011) Twitter catches the flu: detecting influenza epidemics using Twitter Proceedings of the Conference on empirical methods in natural language processing, pp 1568–1576Google Scholar
  3. 3.
    Backstrom L, Sun E, Marlow C (2010) Find me if you can: improving geographical prediction with social and spatial proximity Proceedings of the 19th Int. Conf on World Wide Web, pp 61–70Google Scholar
  4. 4.
    Bouillot F, Poncelet P, Roche M et al (2012) How and why exploit tweet’s location information? International Conference on Geographic Information Science (AGILE)Google Scholar
  5. 5.
    Brazdil P, Gira‘ud-Carrier C, Soares C, Vilalta R (2008) Metalearning: Applications to Data Mining. SpringerGoogle Scholar
  6. 6.
    Chandra S, Khan L, Muhaya FB (2011) Estimating Twitter user location using social interactions–a content based approach 3rd Int. Conf. on Social Computing (SocialCom), pp 838–843Google Scholar
  7. 7.
    Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating Twitter users Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 759–768Google Scholar
  8. 8.
    Compton R, Jurgens D, Allen D (2014) Geotagging one hundred million Twitter accounts with total variation minimization IEEE Int Conf on Big Data, pp 393–401Google Scholar
  9. 9.
    Crandall D, Backstrom L, Cosley D, Suri S, Huttenlocher D, Kleinberg J (2010) Inferring social ties from geographic coincidences. Proc Natl Acad Sci 107 (52):22436–22441CrossRefGoogle Scholar
  10. 10.
    Davis Jr C, Pappa GL, Rennó Rocha de Oliveira D, de L Arcanjo F (2011) Inferring the location of Twitter messages based on user relationships. Trans GIS 15 (6):735–751Google Scholar
  11. 11.
    Eisenstein J, O’Connor B, Smith NA, Xing EP (2010) A latent variable model for geographic lexical variation Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp 1277–1287Google Scholar
  12. 12.
    Finkel J, Grenager T, Manning Ch (2005) Incorporating non-local information into information extraction systems by gibbs sampling Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp 363–370Google Scholar
  13. 13.
    Gelernter J, Mushegian N (2011) Geo-parsing messages from microtext. Trans GIS 15(6):753–773CrossRefGoogle Scholar
  14. 14.
    Goldberg DE (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., 1st editionGoogle Scholar
  15. 15.
    Graham M, Hale SA, Gaffney D (2013) Where in the world are you? geolocation and language identification in Twitter CoRR, abs/1308.0683, abs/1308.0683Google Scholar
  16. 16.
    Bo H, Cook P, Baldwin T (2014) Text-based Twitter user geolocation prediction. Journal of Artificial Intelligence Research, pages 451–500Google Scholar
  17. 17.
    Hecht B, Hong L, Suh B, Chi EH (2011) Tweets from justin bieber’s heart: the dynamics of the location field in user profiles Proceedings of the SIGCHI Conf. on Human Factors in Computing Systems, pp 237–246Google Scholar
  18. 18.
    Ikawa Y, Enoki M, Tatsubori M (2012) Location inference using microblog messages Proceedings of the 21st international conference companion on World Wide Web. ACM, pp 687–690Google Scholar
  19. 19.
    Jurgens D (2013) That’s what friends are for: Inferring location in online social media platforms based on social relationships ICWSMGoogle Scholar
  20. 20.
    Jurgens D, McCorriston J, Xu YT, Ruths D (2015) Geolocation prediction in Twitter using social networks: A critical analysis and review of current practice ICWSMGoogle Scholar
  21. 21.
    Kinsella S, Murdock V, O’Hare N (2011) I’m eating a sandwich in glasgow: modeling locations with tweets Proceedings of the 3rd Int. Workshop on Search and Mining user-generated contents, pp 61–68Google Scholar
  22. 22.
    Kohen J (1960) A coefficient of agreement for nominal scale. Educ Psychol Meas 20:37–46CrossRefGoogle Scholar
  23. 23.
    Longbo K, Liu Z, Huang Y (2014) Spot: Locating social media users based on social network context Proceedings of the VLDB Endowment, vol 7Google Scholar
  24. 24.
    Li R, Wang S, Chang KC-C (2012) Multiple location profiling for users and relationships from social network and content. Proceedings of the VLDB Endowment 5(11):1603–1614CrossRefGoogle Scholar
  25. 25.
    Mahmud J, Nichols J, Drews C (2012) Where is this tweet from? inferring home locations of Twitter users International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  26. 26.
    Paradesi SM (2011) Geotagging tweets using their content FLAIRS ConferenceGoogle Scholar
  27. 27.
    Ren K, Zhang S, Lin H (2012) Where are you settling down: Geo-locating Twitter users based on tweets and social networks Information Retrieval Technology, pp 150–161Google Scholar
  28. 28.
    Ribeiro Jr SS, Davis Jr CA, Oliveira DRR, Meira Jr W, Gonċalves TS, Pappa GL (2012) Traffic observatory: a system to detect and locate traffic events and conditions using Twitter Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks. ACM, pp 5– 11Google Scholar
  29. 29.
    Rodrigues E, Assunção R, Pappa GL, Renno D, Meira Jr. W (2015) Exploring multiple evidence to infer users’ location in Twitter. Neurocomputing, pages –Google Scholar
  30. 30.
    Roller S, Speriosu M, Rallapalli S, Wing B, Baldridge J (2012) Supervised text-based geolocation using language models on an adaptive grid Proceedings of the Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp 1500–1510Google Scholar
  31. 31.
    Rout D, Bontcheva K, Preoṫiuc-Pietro D, Cohn T (2013) Where’s@ wally?: a classification approach to geolocating users based on their social ties Proceedings of the 24th ACM Conference on Hypertext and Social Media, pp 11–20Google Scholar
  32. 32.
    Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors Proceedings of the 19th Int. Conf. on World Wide Web, pp 851–860Google Scholar
  33. 33.
    Schulz A, Hadjakos As, Paulheim H, Nachtwey Js, Mühlhäuser M (2013) A multi-indicator approach for geolocalization of tweets Proceedings of the 7th Int. Conf. on Weblogs and Social Media, International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  34. 34.
    Sultanik EA, Fink C (2012) Rapid geotagging and disambiguation of social media text via an indexed gazetteer ISCRAM, 2012, pp 1–10Google Scholar
  35. 35.
    Takhteyev Y, Gruzd A, Wellman B (2012) Geography of Twitter networks. Soc Networks 34(1):73–81CrossRefGoogle Scholar
  36. 36.
    Todorovski L, DŻeroski S (2000) Combining multiple models with meta decision trees. SpringerGoogle Scholar
  37. 37.
    Wing B, Baldridge J (2011) Simple supervised document geolocation with geodesic grids ACL, vol 11, pp 955–964Google Scholar
  38. 38.
    Witten IH, Frank E, Hall MA (2011) Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishers Inc. 3rd editionGoogle Scholar
  39. 39.
    Zhu X, Ghahramani Z (2002) Learning from labeled and unlabeled data with label propagation. Technical report, Carnegie Mellon UniversityGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Computer Science DepartmentUniversidade Federal de Minas GeraisBelo HorizonteBrazil

Personalised recommendations