Abstract
The objective, connotations and research issues of big geodata mining were discussed to address its significance to geographical research in this paper. Big geodata may be categorized into two domains: big earth observation data and big human behavior data. A description of big geodata includes, in addition to the “5Vs” (volume, velocity, value, variety and veracity), a further five features, that is, granularity, scope, density, skewness and precision. Based on this approach, the essence of mining big geodata includes four aspects. First, flow space, where flow replaces points in traditional space, will become the new presentation form for big human behavior data. Second, the objectives for mining big geodata are the spatial patterns and the spatial relationships. Third, the spatiotemporal distributions of big geodata can be viewed as overlays of multiple geographic patterns and the characteristics of the data, namely heterogeneity and homogeneity, may change with scale. Fourth, data mining can be seen as a tool for discovery of geographic patterns and the patterns revealed may be attributed to human-land relationships. The big geodata mining methods may be categorized into two types in view of the mining objective, i.e., classification mining and relationship mining. Future research will be faced by a number of issues, including the aggregation and connection of big geodata, the effective evaluation of the mining results and the challenge for mining to reveal “non-trivial” knowledge.
Similar content being viewed by others
References
Andrieu C, De Freitas N, Doucet A et al., 2003. An introduction to MCMC for machine learning. Machine learning, 50(1/2): 5–43.
Atkinson P M, Tatnall A R L, 1997. Neural networks in remote sensing: Introduction. International Journal of Remote Sensing, 18(4): 699–709.
Batty M, 2013. The New Science of Cities. Cambridge, MA: MIT Press.
Beale C M, Lennon J J, Yearsley J M et al., 2010. Regression analysis of spatial data. Ecology Letters, 13(2): 246–264.
Benz U C, Hofmann P, Willhauck G et al., 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry and Remote Sensing, 58(3/4): 239–258.
Brereton R G, Lloyd G R, 2010. Support vector machines for classification and regression. Analyst, 135(2): 230–267.
Brunsdon C, Fotheringham A S, Charlton M E, 1996. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28(4): 281–298.
Brunsdon C, Fotheringham S, Charlton M, 1998. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3): 431–443.
Byrne G F, Crapper P F, Mayo K K, 1980. Monitoring land-cover change by principal component analysis of multitemporal Landsat data. Remote Sensing of Environment, 10(3): 175–184.
Castells M, 1999. Grassrooting the space of flows. Urban Geography, 20(4): 294–302.
Castro P S, Zhang D, Li S, 2012. Urban traffic modelling and prediction using large scale taxi GPS traces. Proceeding of Pervasive’12 Proceedings of the 10th International Conference on Pervasive Computing, Newcastle, UK, June 18–22, 2012: 57–72.
Chen J, Ban Y, Li S, 2015. China: Open access to Earth land-cover map. Nature, 514(7523): 434.
Chen Long, Stuart Neil, Mackaness A, 2015. Williams. Cluster and hot spot analysis in Lincoln, Nebraska, USA. Geomatics and Spatial Information Technology, 38(3): 189–192. (in Chinese)
Chen M S, Han J, Yu P S, 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6): 866–883.
Cheng Changxiu, Shi Peijun, Song Changqing et al., 2018. Geographic big-data: A new opportunity for geography complexity study. Acta Geographica Sinica, 73(8): 1397–1406. (in Chinese)
Cheng R, Emrich T, Kriegel H P et al., 2014. Managing uncertainty in spatial and spatio-temporal data. Proceedings of the IEEE 30th International Conference on Data Engineering (ICDE), Chicago, IL, USA, Mar 31–Apr 04, 2014: 1302–1305.
Congalton R G, 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37(1): 35–46.
Dai J, Yang B, Guo C et al., 2015. Personalized route recommendation using big trajectory data. Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE), Seoul, South Korea, April 13–17, 2015: 543–554.
Data Center of Sina Micro-blog, 2017. 2017 User Development Report of Sina Micro-blog. http://data.weibo.com/report/reportDetail?id=404. (in Chinese)
Du Zhenyu, Xing Shangjun, Song Yumin et al., 2007. Lead pollution along expressways and its attenuation by green belts in Shandong province. Journal of Soil and Water Conservation, 21(5): 175–179. (in Chinese)
Džeroski S, 2009. Relational Data Mining. Boston, MA: Springer, 887–911.
Ester M, Kriegel H P, Sander J et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, Aug 02–04, 1996: 226–231.
Fayyad U, Piatetsky-Shapiro G, Smyth P, 1996. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11): 27–34.
Friedl M A, Brodley C E, 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61(3): 399–409.
Ginsberg J, Mohebbi M H, Patel R S et al., 2009. Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012–1015.
Gislason P O, Benediktsson J A, Sveinsson J R, 2006. Random forests for land cover classification. Pattern Recognition Letters, 27(4): 294–300.
Goodchild M F, 2004. The validity and usefulness of laws in geographic information science and geography. Annals of the Association of American Geographers, 94(2): 300–303.
Goodchild M F, Yuan M, Cova T J, 2007. Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science, 21(3): 239–260.
Han J, Lee J G, Kamber M, 2009. An overview of clustering methods in geographic data analysis. Geographic Data Mining and Knowledge Discovery, 2: 149–170.
Han Zhigang, Kong Yunfeng, Qin Yaochen, 2011. Research on geographic representation: A review. Progress in Geography, 30(2): 141–148. (in Chinese)
Harvey J M, Han J W, 2009. Geographic Data Mining and Knowledge Discovery. London: CRC Press.
Huang Y, Shekhar S, Xiong H, 2004. Discovering colocation patterns from spatial data sets: A general approach. IEEE Transactions on Knowledge and Data Engineering, 16(12): 1472–1485.
Keola S, Andersson M, Hall O, 2015. Monitoring economic development from space: Using nighttime light and land cover data to measure economic growth. World Development, 66: 322–334.
Kong X, Xu Z, Shen G et al., 2016. Urban traffic congestion estimation and prediction based on floating car trajectory data. Future Generation Computer Systems: The International Journal of EScience, 61: 97–107.
Koperski K, Han J, 1995. Discovery of spatial association rules in geographic information databases. Proceedings of the 4th International Symposium on Large Spatial Databases (SSD 95), Portland, ME, USA, Aug 06–09, 1995: 47–66.
Koperski K, Han J, Stefanovic N, 1998. An efficient two-step method for classification of spatial data. Proceedings of the 8th International Symposium on Spatial Data Handling (SDH’98), Vancouver, BC, Canada, July 11–15, 1998: 45–54.
Lazer D, Kennedy R, King G et al., 2014. The parable of Google Flu: Traps in big data analysis. Science, 343(6176): 1203–1205.
LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature, 521(7553): 436–444.
Li Deren, Cheng Tao, 1995. Knowledge discovery from GIS databases. Acta Geodaetica et Cartographica Sinica, 24(1): 37–44. (in Chinese)
Li Deren, Wang Shuliang, Li Deyi et al., 2002. Theories and technologies of spatial data mining and knowledge discovery. Geomatics and Information Science of Wuhan University, 27(3): 221–233. (in Chinese)
Li Deren, Wang Shuliang, Shi Wenzhong et al., 2001. On spatial data mining and knowledge discovery. Geomat-ics and Information Science of Wuhan University, 26(6): 491–499. (in Chinese)
Li X, Yeh A G O, 2002. Neural-network-based cellular automata for simulating multiple land use changes using GIS. International Journal of Geographical Information Science, 16(4): 323–343.
Liu Yang, Liu Ronggao, 2015. Retrieval of global long-term leaf area index from LTDR AVHRR and MODIS observations. Journal of Geo-Information Science, 17(11): 1304–1312. (in Chinese)
Liu Yu, 2016. Revisiting several basic geographical concepts: A social sensing perspective. Acta Geographica Sinica, 71(4): 564–575. (in Chinese)
Liu Z, Ma T, Du Y et al., 2018. Mapping hourly dynamics of urban population using trajectories reconstructed from mobile phone records. Transactions in GIS, 22(2): 494–513.
Marr B. Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance. Chichester, UK: John Wiley & Sons, 2015.
Mayer-Schonberger V, Cukier K, 2013. Big Data: A Revolution That Will Transform How We Live, Work, and Think. London: John Murray.
McMillen D P, 2004. Geographically weighted regression: The analysis of spatially varying relationships. American Journal of Agricultural Economics, 86(2): 554–556.
Miller H J, Goodchild M F, 2015. Data-driven geography. GeoJournal, 80(4): 449–461.
Moon T K, 1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6): 47–60.
Mutanga O, Adam E, Cho M A, 2012. High density biomass estimation for wetland vegetation using World-View-2 imagery and random forest regression algorithm. International Journal of Applied Earth Observation and Geoinformation, 18: 399–406.
NASA, 2017. New Night Lights Maps Open Up Possible Real-Time Applications. https://www.nasa.gov/fea-ture/goddard/2017/new-night-lights-maps-open-up-possible-real-time-applications.
Newing A, Anderson B, Bahaj A B et al., 2016. The role of digital trace data in supporting the collection of population statistics: The case for smart metered electricity consumption data. Population, Space and Place, 22(8): 849–863.
Niemeijer D, 2002. Developing indicators for environmental policy: Data-driven and theory-driven approaches examined by example. Environmental Science & Policy, 5(2): 91–103.
Niu N, Liu X P, Jin H et al., 2017. Integrating multi-source big data to infer building functions. International Journal of Geographical Information Science, 31(9): 1871–1890.
NOAA/National Centers for Environmental Information, 2018. Global Historical Climate Network Daily: Description. https://www.ncdc.noaa.gov/ghcn-daily-description.
Novembre J, Johnson T, Bryc K et al., 2008. Genes mirror geography within Europe. Nature, 456(7218): 98–101.
Oliver M A, Webster R, 1990. Kriging: A method of interpolation for geographical information systems. International Journal of Geographical Information System, 4(3): 313–332.
Pal M, Mather P M, 2005. Support vector machines for classification in remote sensing. International Journal of Remote Sensing, 26(5): 1007–1011.
Pei T, Gao J, Ma T et al., 2012. Multi-scale decomposition of point process data. GeoInformatica, 16(4): 625–652.
Pei T, Sobolevsky S, Ratti C et al., 2014. A new insight into land use classification based on aggregated mobile phone data. International Journal of Geographical Information Science, 28(9): 1988–2007.
Pei Tao, Li Ting, Zhou Chenghu, 2013. Spatiotemporal point process: A new data model, analysis methodology and viewpoint for geoscientific problem. Journal of Geo-Information Science, 15(6): 793–800. (in Chinese)
Qian Chengcheng, Cheng Ge, 2018. Big data science for ocean: Present and future. Bulletin of Chinese Academy of Sciences, 33(8): 884–891. (in Chinese)
Silver D, Huang A, Maddison C J et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484–489.
Silver D, Schrittwieser J, Simonyan K et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676): 354–359.
Song C, Qu Z, Blumm N et al., 2010. Limits of predictability in human mobility. Science, 327(5968): 1018–1021.
Stein M L, 2012. Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer Science & Business Media.
Sun Zhongwei, Lu Zi, 2005. A geographical perspective to the elementary nature of space of flows. Geography and Geo-Information Science, 21(1): 109–112. (in Chinese)
Tobler W R, 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(Suppl.1): 234–240.
Vazifeh M M, Santi P, Resta G et al., 2018. Addressing the minimum fleet problem in on-demand urban mobility. Nature, 557(7706): 534–538.
Wang Haiqi, Wang Jinfeng, 2005. Research on progress of spatial data mining. Geography and Geo-Information Science, (4): 6–10. (in Chinese)
Yuan J, Zheng Y, Xie X et al., 2013. T-Drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge & Data Engineering, 25(1): 220–232.
Yuan Y, Wei G, Lu Y, 2018. Evaluating gender representativeness of location-based social media: A case study of Weibo. Annals of GIS, 24(3): 163–176.
Zandbergen P A, 2008. Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Transactions in GIS, 12(1): 103–130.
Zhang Wenjian, 2010. WMO integrated global observing system (WIGOS). Meteorological Monthly, 36(3): 1–8. (in Chinese)
Zhao B, Zhang S, 2018. Rethinking spatial data quality: Pokémon go as a case study of location spoofing. The Professional Geographer, doi: 10.1080/00330124.2018.1479973.
Zhao Z L, Shaw S L, Xu Y et al., 2016. Understanding the bias of call detail records in human mobility research. International Journal of Geographical Information Science, 30(9): 1738–1762.
Zheng Y, Liu Y, Yuan J et al., 2011. Urban computing with taxicabs. Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, September 17–21, 2011: 89–98.
Zhu A-X, Qi F, Moore A et al., 2010a. Prediction of soil properties using fuzzy membership values. Geoderma, 158(3/4): 199–206.
Zhu A-X, Yang L, Li B et al., 2010b. Construction of membership functions for predictive soil mapping under fuzzy logic. Geoderma, 155(3/4): 164–174.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation: National Natural Science Foundation of China, No.41525004, No.41421001
Author: Pei Tao (1972-), Professor, specialized in big geodata mining.
Rights and permissions
About this article
Cite this article
Pei, T., Song, C., Guo, S. et al. Big geodata mining: Objective, connotations and research issues. J. Geogr. Sci. 30, 251–266 (2020). https://doi.org/10.1007/s11442-020-1726-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11442-020-1726-7