A Rule-Based Approach for Detecting Location Leaks of Short Text Messages

  • Hoang-Quoc Nguyen-SonEmail author
  • Minh-Triet Tran
  • Hiroshi Yoshiura
  • Noboru Sonehara
  • Isao Echizen
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 228)


As of today, millions of people share messages via online social networks, some of which probably contain sensitive information. An adversary can collect these freely available messages and specifically analyze them for privacy leaks, such as the users’ location. Unlike other approaches that try to detect these leaks using complete message streams, we put forward a rule-based approach that works on single and very short messages to detect location leaks. We evaluated our approach based on 2817 tweets from the Tweets2011 data set. It scores significantly better (accuracy = 84.95 %) on detecting whenever a message reveals the user’s location than a baseline using machine learning and three extensions using heuristic. Advantages of our approach are not only to apply for online social network messages but also to extend for other areas (such as email, military, health) and for other languages.


Location leak Online social network Short text message 


  1. 1.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: 43rd Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 363–370 (2005)Google Scholar
  2. 2.
    Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1524–1534 (2011)Google Scholar
  3. 3.
    Ounis, I., Macdonald, C., Lin, J., Soboroff, I.: Overview of the trec-2011 microblog track. In: 20th Text Retrieval Conference (2011)Google Scholar
  4. 4.
    Cheng, Z., Caverlee, J., Lee, K.: You are where you tweet: a content-based approach to geo-locating twitter users. In: 19th ACM International Conference on Information and Knowledge Management, pp. 759–768. ACM (2010)Google Scholar
  5. 5.
    Stutzman, F., Gross, R., Acquisti, A.: Silent listeners: the evolution of privacy and disclosure on facebook. J. Priv. Confid. 4(2), 7–41 (2013)Google Scholar
  6. 6.
    Amitay, E., Har’El, N., Sivan, R., Soffer, A.: Web-a-where: geotagging web content. In: 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280. ACM (2004)Google Scholar
  7. 7.
    Fink, C., Piatko, C.D., Mayfield, J., Finin, T., Martineau, J.: Geolocating blogs from their textual content. In: AAAI Spring Symposium: Social Semantic Web: Where Web 2.0 Meets Web 3.0., pp. 25–26 (2009)Google Scholar
  8. 8.
    Kitamoto, A., Sagara, T.: Toponym-based geotagging for observing precipitation from social and scientific data streams. In: ACM Multimedia Workshop on Geotagging and Its Applications in Multimedia, pp. 23–26. ACM (2012)Google Scholar
  9. 9.
    Shuyo, N.: Language detection library for java (2010).
  10. 10.
    Han, B., Cook, P., Baldwin, T.: Automatically constructing a normalisation dictionary for microblogs. In: Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, pp. 421–432 (2012)Google Scholar
  11. 11.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: 52nd Annual Meeting of the Association for Computational Linguistics, pp. 55–60 (2014)Google Scholar
  12. 12.
    Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  13. 13.
    Jurafsky, D., James, H.: Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech, 2nd edn, pp. 83–122. Prentice Hall, Upper Saddle River (2008)Google Scholar
  14. 14.
    Nguyen-Son, H.Q., Minh-Triet, T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing personal text messages posted in online social networks and detecting disclosures of personal information. IEICE Trans. Inf. Syst. 98(1), 78–88 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hoang-Quoc Nguyen-Son
    • 1
    Email author
  • Minh-Triet Tran
    • 2
  • Hiroshi Yoshiura
    • 3
  • Noboru Sonehara
    • 4
  • Isao Echizen
    • 1
    • 4
  1. 1.SOKENDAI (The Graduate University for Advanced Studies)TokyoJapan
  2. 2.University of Science, VNU-HCMHochiminhVietnam
  3. 3.University of Electro-CommunicationsTokyoJapan
  4. 4.National Institute of InformaticsTokyoJapan

Personalised recommendations