Abstract
A tweet, possessing various facets, is created at the speed of thought, propagated in real time and produces social interchange on an international scale. As a result, users demand the analysis of twitter mining with a map to search for trendy topics or find what is being talked about among users. Due to the sparsity of location information, however, there are real difficulties in analysis related to position information. To run Twitter mining on all Korean users, this study used firehose level, which is massive 100 % twitter data, while utilizing a new spatial indicator to overcome the sparsity of location information. Furthermore, the study suggested an algorithm to process firehose data and solutions to overcome the study’s limit. The conventional method of using spritzer level data and the supervised method resulted in 44 times more positions inferred on a tweet than the method using geotag, whereas the method used in this study saw inferences rise 680 fold. In the case of the clustering algorithm, the method of K-Center Clustering was found to have inferred the most number of user residential locations. The ultimate goal of the study is for the twitter data, including the massive volume of location information inferred and created in real time, to serve as a means of city monitoring by overcoming the study’s limit, which is automated refining of unnecessary words for profile location information and twitter mining.
Similar content being viewed by others
Notes
Data that streams in real time or changes into a dynamic state.
Searching for topics prevalent in tweets, or finding out what is being talked about among people [2].
An indicator to show location value on a twitter [6].
JSON (JavaScript Object Notation) is a form of data exchange. This form makes people it easy to read and write. Besides, this form facilitates an analysis and configuration by machines.
References
Lee, B. Y., Lim, J. T., & Yoo, J. (2013). Utilization of social media analysis using big data. The Journal of the Korea Contents Association, 13(2), 211–219.
Russell, M. A. (2013). Mining the social web: data mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More (2nd ed.). Sebastopol: O’Reilly Media.
Guo, J., Zhang, P., & Guo, L. (2012). Mining hot topics from Twitter streams. Procedia Computer Science, 9, 2008–2011.
Bifet, A. (2013). Mining big data in real time. Informatica, 37, 15–20.
Kim, M. G., & Koh, J. H. (2016). Recent research trends for geospatial information explored by Twitter data. Spatial Information Research, 24(2), 65–73. doi:10.1007/s41324-016-0007-0.
Ajao, O., Hong, J., & Liu, W. (2015). A survey of location inference techniques on Twitter. Journal of Information Science, 41(6), 855–864.
Blanford, J., Huang, Z., Savelyev, A., & MacEachren, A. M. (2015). Geo-located tweets. Enhancing mobility maps and capturing cross-border movement. PLoS One, 10(6), e0129202.
Dredze, M., Paul, M. J., Bergsma, S., & Tran, H. (2013). Carmen: A twitter geolocation system with applications to public health. In AAAI workshop on expanding the boundaries of health informatics using AI(HIAI) (pp 20–24).
Nelson, J. K., Quinn, S., Swedberg, B., Chu, W., & MacEachren, A. M. (2015). Geovisual analytics approach to exploring public political discourse on Twitter. ISPRS International Journal of Geo-Information, 4(1), 337–366.
Tweetping Website. https://www.tweetping.net. Accessed 1 April 2016.
LIVE Singapore Website. http://senseable.mit.edu/livesingapore/index.html. Accessed 1 April 2016.
SK Telecom Smart Insight Webpage. http://www.smartinsight.co.kr. Accessed 1 April 2016.
Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the sample good enough? Comparing data from twitter’s streaming api with twitter’s firehose. In Proceedings of ICWSM.
Luo, F., Cao, G., Mulligan, K., & Li, X. (2015). Explore spatiotemporal and demographic characteristics of human mobility via twitter: A case study of Chicago. arXiv preprint arXiv:1508.00188.
Frias-Martinez, V., Sae-Tang, A., & Frias-Martinez, E. (2014). To call, or to tweet? Understanding 3-1-1 citizen complaint behaviors. In SocialCom 2014: The sixth IEEE/ASE international conference on social computing. http://galaxy.cs.lamar.edu/~kmakki/2014-ASE/2014%20ASE%20Conference%20Stanford%20University%20Proceedings/Proceedings.pdf
Zhang, J., Sun, J., Zhang, R., & Zhang, Y. (2015). Your actions tell where you are: Uncovering Twitter users in a metropolitan area. In IEEE Conference on Communications and Network Security (CNS), 2015 (pp. 424–432).
Yim, J. Y., Ha, H. S., & Hwang, B. Y. (2015). A method for detecting event location based on similar keyword extraction in tweet text. Journal of Korea Spatial Information Society, 23(5), 1–7.
Gonzalez, R., Figueroa, G., & Chen, Y. S. (2012). Tweolocator: a non-intrusive geographical locator system for twitter. In Proceedings of the 5th ACM SIGSPATIAL international workshop on location-based social networks (pp. 24–31).
Kotzias, D., Lappas, T., & Gunopulos, D. (2014). Addressing the Sparsity of Location Information on Twitter. In EDBT/ICDT Workshops (pp. 339–346).
Valkanas, G., & Gunopulos, D. (2012). Location extraction from social networks with commodity software and online data. In IEEE 12th international conference on data mining workshops (ICDMW), 2012 (pp. 827–834).
Lim, H. J., & Park, S. H. (2015). A tentative approach for regional futures strategy with big data. The Korean Cadastre Information Association, 17(1), 75–90.
Park, W. J., & Yu, K. Y. (2015). Spatial clustering analysis based on text mining of location based social media data. Journal of the Korean Society for Geospatial Information Science, 23(2), 89–96.
Kang, A. T., & Kang, Y. O. (2015). Location inference of Twitter users using timeline data. Journal of Korea Spatial Information Society, 23(2), 69–81.
Han, S. G. (2014). Social media. Melbourne: Acorn Publication.
Li, R., Wang, S., Deng, H., Wang, R., & Chang, K. C. (2012). Towards social user profiling: Unified and discriminative influence model for inferring home locations. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1023–1031).
Abdelhaq, H., Gertz, M., & Sengstock, C. (2013). Spatio-temporal characteristics of bursty words in Twitter streams. In Proceedings of the 21st ACM SIGSPATIAL international conference on advances in geographic information systems (pp. 194–203).
120 Dasan Seoul Call Center Webpage. http://120dasan.seoul.go.kr/foreign/english.html. Accessed 1 April 2016.
Seoul Smart Report Application. https://play.google.com/store/apps/details?id=kr.go.seoul.seoulSmartReport&hl=ko. Accessed 1 April 2016.
K-Center Clustering. http://trendsofcode.net
Author information
Authors and Affiliations
Corresponding author
Additional information
This article is a condensed form of the first author’s Ph.D. thesis from University of Seoul.
Rights and permissions
About this article
Cite this article
Kim, M.G., Kang, Y.O., Lee, J.Y. et al. Inferring tweet location inference for twitter mining. Spat. Inf. Res. 24, 421–435 (2016). https://doi.org/10.1007/s41324-016-0041-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41324-016-0041-y