Skip to main content
Log in

Home location inference from sparse and noisy data: models and applications

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous (and expensive) Global Positioning System (GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people’s health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ashbrook, D., Starner, T., 2003. Using GPS to learn significant locations and predict movement across multiple users. Pers. Ubiq. Comput., 7(5):275–286. http://dx.doi.org/10.1007/s00779-003-0240-0

    Article  Google Scholar 

  • Backstrom, L., Sun, E., Marlow, C., 2010. Find me if you can: improving geographical prediction with social and spatial proximity. Proc. 19th Int. Conf. on World Wide Web, p.61–70. http://dx.doi.org/10.1145/1772690.1772698

    Chapter  Google Scholar 

  • Cheng, Z., Caverlee, J., Lee, K., 2010. You are where you tweet: a content-based approach to geo-locating twitter users. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.759–768. http://dx.doi.org/10.1145/1871437.1871535

    Google Scholar 

  • Cheng, Z., Caverlee, J., Lee, K., et al., 2011. Exploring millions of footprints in location sharing services. Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, p.81–88.

    Google Scholar 

  • Cho, E., Myers, S.A., Leskovec, J., 2011. Friendship and mobility: user movement in location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1082–1090. http://dx.doi.org/10.1145/2020408.2020579

    Google Scholar 

  • Cranshaw, J., Toch, E., Hong, J., et al., 2010. Bridging the gap between physical location and online social networks. Proc. 12th ACM Int. Conf. on Ubiquitous Computing, p.119–128. http://dx.doi.org/10.1145/1864349.1864380

    Google Scholar 

  • Culotta, A., 2010. Towards detecting influenza epidemics by analyzing Twitter messages. Proc. 1st Workshop on Social Media Analytics, p.115–122. http://dx.doi.org/10.1145/1964858.1964874

    Chapter  Google Scholar 

  • Hoh, B., Gruteser, M., Xiong, H., et al., 2006. Enhancing security and privacy in traffic-monitoring systems. IEEE Perv. Comput., 5(4):38–46. http://dx.doi.org/10.1109/MPRV.2006.69

    Article  Google Scholar 

  • Krumm, J., 2007. Inference attacks on location tracks. Proc. 5th Int. Conf. on Pervasive Computing, p.127–143. http://dx.doi.org/10.1007/978-3-540-72037-9_8

    Chapter  Google Scholar 

  • Krumm, J., Rouhana, D., 2013. Placer: semantic place labels from diary data. Proc. ACM Int. Joint Conf. on Pervasive and Ubiquitous Computing, p.163–172. http://dx.doi.org/10.1145/2493432.2493504

    Google Scholar 

  • Lin, M., Hsu, W., Lee, Z., 2012. Predictability of individuals’ mobility with high-resolution positioning data. Proc. ACM Conf. on Ubiquitous Computing, p.381–390. http://dx.doi.org/10.1145/2370216.2370274

    Google Scholar 

  • Mahmud, J., Nichols, J., Drews, C., 2012. Where is this tweet from? Inferring home locations of Twitter users. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, p.511–514.

    Google Scholar 

  • Paul, M.J., Dredze, M., 2011. A Model for Mining Public Health Topics from Twitter. Technical Report, Johns Hopkins University, USA.

    Google Scholar 

  • Pontes, T., Magno, G., Vasconcelos, M., et al., 2012a. Beware of what you share: inferring home location in social networks. Proc. IEEE 12th Int. Conf. on Data Mining Workshops, p.571–578. http://dx.doi.org/10.1109/ICDMW.2012.106

    Google Scholar 

  • Pontes, T., Vasconcelos, M., Almeida, J., et al., 2012b. We know where you live: privacy characterization of Foursquare behavior. Proc. ACM Conf. on Ubiquitous Computing, p.898–905. http://dx.doi.org/10.1145/2370216.2370419

    Google Scholar 

  • Sadilek, A., Krumm, J., 2012. Far out: predicting long-term human mobility. Proc. 26th AAAI Conf. on Artificial Intelligence, p.814–820.

    Google Scholar 

  • Sadilek, A., Kautz, H., 2013. Modeling the impact of lifestyle on health at scale. Proc. 6th ACM Int. Conf. on Web Search and Data Mining, p.637–646. http://dx.doi.org/10.1145/2433396.2433476

    Google Scholar 

  • Sadilek, A., Kautz, H., Silenzio, V., 2012. Modeling spread of disease from social interactions. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media.

    Google Scholar 

  • Sapolsky, R. M., 2004. Social status and health in humans and other animals. Ann. Rev. Anthropol., 33:393–418.

    Article  Google Scholar 

  • Scellato, S., Noulas, A., Lambiotte, R., et al., 2011a. Sociospatial properties of online location-based social networks. Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, p.329–336.

    Google Scholar 

  • Scellato, S., Noulas, A., Mascolo, C., 2011b. Exploiting place features in link prediction on location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1046–1054. http://dx.doi.org/10.1145/2020408.2020575

    Google Scholar 

  • Smith, G., Wieser, R., Goulding, J., et al., 2014. A refined limit on the predictability of human mobility. Proc. IEEE Int. Conf. on Pervasive Computing and Communications, p.88–94. http://dx.doi.org/10.1109/PerCom.2014.6813948

    Google Scholar 

  • Song, C., Qu, Z., Blumm, N., et al., 2010. Limits of predictability in human mobility. Science, 327(5968):1018–1021. http://dx.doi.org/10.1126/science.1177170

    Article  MathSciNet  Google Scholar 

  • Winkleby, M.A., Jatulis, D.E., Frank, E., et al., 1992. Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. Am. J. Public Health, 82(6):816–820. http://dx.doi.org/10.2105/AJPH.82.6.816

    Article  Google Scholar 

  • Xing, W., Ghorbani, A., 2004. Weighted pagerank algorithm. Proc. 2nd Annual Conf. on Communication Networks and Services Research, p.305–314. http://dx.doi.org/10.1109/DNSR.2004.1344743

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie-bo Luo.

Additional information

Project supported by the Goergen Institute for Data Science, New York State and the Xerox Foundation

ORCID: Tian-ran HU, http://orcid.org/0000-0003-0086-2447

Dr. Jie-bo LUO, corresponding author of this invited research article, joined the University of Rochester in Fall 2011 after over 15 years at Kodak Research Laboratories, where he was a senior principal scientist leading research and advanced development. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010 and IEEE CVPR 2012. He is the Editor-in-Chief of Journal of Multimedia, and has served on the editorial boards of IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, Pattern Recognition, Machine Vision and Applications, and Journal of Electronic Imaging. He is a Fellow of the SPIE, IEEE, and IAPR.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, Tr., Luo, Jb., Kautz, H. et al. Home location inference from sparse and noisy data: models and applications. Frontiers Inf Technol Electronic Eng 17, 389–402 (2016). https://doi.org/10.1631/FITEE.1500385

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500385

Keywords

CLC number

Navigation