Skip to main content
Log in

Big geodata mining: Objective, connotations and research issues

  • Published:
Journal of Geographical Sciences Aims and scope Submit manuscript

Abstract

The objective, connotations and research issues of big geodata mining were discussed to address its significance to geographical research in this paper. Big geodata may be categorized into two domains: big earth observation data and big human behavior data. A description of big geodata includes, in addition to the “5Vs” (volume, velocity, value, variety and veracity), a further five features, that is, granularity, scope, density, skewness and precision. Based on this approach, the essence of mining big geodata includes four aspects. First, flow space, where flow replaces points in traditional space, will become the new presentation form for big human behavior data. Second, the objectives for mining big geodata are the spatial patterns and the spatial relationships. Third, the spatiotemporal distributions of big geodata can be viewed as overlays of multiple geographic patterns and the characteristics of the data, namely heterogeneity and homogeneity, may change with scale. Fourth, data mining can be seen as a tool for discovery of geographic patterns and the patterns revealed may be attributed to human-land relationships. The big geodata mining methods may be categorized into two types in view of the mining objective, i.e., classification mining and relationship mining. Future research will be faced by a number of issues, including the aggregation and connection of big geodata, the effective evaluation of the mining results and the challenge for mining to reveal “non-trivial” knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andrieu C, De Freitas N, Doucet A et al., 2003. An introduction to MCMC for machine learning. Machine learning, 50(1/2): 5–43.

    Article  Google Scholar 

  • Atkinson P M, Tatnall A R L, 1997. Neural networks in remote sensing: Introduction. International Journal of Remote Sensing, 18(4): 699–709.

    Article  Google Scholar 

  • Batty M, 2013. The New Science of Cities. Cambridge, MA: MIT Press.

    Book  Google Scholar 

  • Beale C M, Lennon J J, Yearsley J M et al., 2010. Regression analysis of spatial data. Ecology Letters, 13(2): 246–264.

    Article  Google Scholar 

  • Benz U C, Hofmann P, Willhauck G et al., 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry and Remote Sensing, 58(3/4): 239–258.

    Article  Google Scholar 

  • Brereton R G, Lloyd G R, 2010. Support vector machines for classification and regression. Analyst, 135(2): 230–267.

    Article  Google Scholar 

  • Brunsdon C, Fotheringham A S, Charlton M E, 1996. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis, 28(4): 281–298.

    Article  Google Scholar 

  • Brunsdon C, Fotheringham S, Charlton M, 1998. Geographically weighted regression. Journal of the Royal Statistical Society: Series D (The Statistician), 47(3): 431–443.

    Google Scholar 

  • Byrne G F, Crapper P F, Mayo K K, 1980. Monitoring land-cover change by principal component analysis of multitemporal Landsat data. Remote Sensing of Environment, 10(3): 175–184.

    Article  Google Scholar 

  • Castells M, 1999. Grassrooting the space of flows. Urban Geography, 20(4): 294–302.

    Article  Google Scholar 

  • Castro P S, Zhang D, Li S, 2012. Urban traffic modelling and prediction using large scale taxi GPS traces. Proceeding of Pervasive’12 Proceedings of the 10th International Conference on Pervasive Computing, Newcastle, UK, June 18–22, 2012: 57–72.

    Google Scholar 

  • Chen J, Ban Y, Li S, 2015. China: Open access to Earth land-cover map. Nature, 514(7523): 434.

    Google Scholar 

  • Chen Long, Stuart Neil, Mackaness A, 2015. Williams. Cluster and hot spot analysis in Lincoln, Nebraska, USA. Geomatics and Spatial Information Technology, 38(3): 189–192. (in Chinese)

    Google Scholar 

  • Chen M S, Han J, Yu P S, 1996. Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6): 866–883.

    Article  Google Scholar 

  • Cheng Changxiu, Shi Peijun, Song Changqing et al., 2018. Geographic big-data: A new opportunity for geography complexity study. Acta Geographica Sinica, 73(8): 1397–1406. (in Chinese)

    Google Scholar 

  • Cheng R, Emrich T, Kriegel H P et al., 2014. Managing uncertainty in spatial and spatio-temporal data. Proceedings of the IEEE 30th International Conference on Data Engineering (ICDE), Chicago, IL, USA, Mar 31–Apr 04, 2014: 1302–1305.

    Book  Google Scholar 

  • Congalton R G, 1991. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sensing of Environment, 37(1): 35–46.

    Article  Google Scholar 

  • Dai J, Yang B, Guo C et al., 2015. Personalized route recommendation using big trajectory data. Proceedings of the 2015 IEEE 31st International Conference on Data Engineering (ICDE), Seoul, South Korea, April 13–17, 2015: 543–554.

    Google Scholar 

  • Data Center of Sina Micro-blog, 2017. 2017 User Development Report of Sina Micro-blog. http://data.weibo.com/report/reportDetail?id=404. (in Chinese)

    Google Scholar 

  • Du Zhenyu, Xing Shangjun, Song Yumin et al., 2007. Lead pollution along expressways and its attenuation by green belts in Shandong province. Journal of Soil and Water Conservation, 21(5): 175–179. (in Chinese)

    Google Scholar 

  • Džeroski S, 2009. Relational Data Mining. Boston, MA: Springer, 887–911.

    Google Scholar 

  • Ester M, Kriegel H P, Sander J et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding KDD’96 Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, Aug 02–04, 1996: 226–231.

    Google Scholar 

  • Fayyad U, Piatetsky-Shapiro G, Smyth P, 1996. The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11): 27–34.

    Article  Google Scholar 

  • Friedl M A, Brodley C E, 1997. Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61(3): 399–409.

    Article  Google Scholar 

  • Ginsberg J, Mohebbi M H, Patel R S et al., 2009. Detecting influenza epidemics using search engine query data. Nature, 457(7232): 1012–1015.

    Article  Google Scholar 

  • Gislason P O, Benediktsson J A, Sveinsson J R, 2006. Random forests for land cover classification. Pattern Recognition Letters, 27(4): 294–300.

    Article  Google Scholar 

  • Goodchild M F, 2004. The validity and usefulness of laws in geographic information science and geography. Annals of the Association of American Geographers, 94(2): 300–303.

    Article  Google Scholar 

  • Goodchild M F, Yuan M, Cova T J, 2007. Towards a general theory of geographic representation in GIS. International Journal of Geographical Information Science, 21(3): 239–260.

    Article  Google Scholar 

  • Han J, Lee J G, Kamber M, 2009. An overview of clustering methods in geographic data analysis. Geographic Data Mining and Knowledge Discovery, 2: 149–170.

    Google Scholar 

  • Han Zhigang, Kong Yunfeng, Qin Yaochen, 2011. Research on geographic representation: A review. Progress in Geography, 30(2): 141–148. (in Chinese)

    Google Scholar 

  • Harvey J M, Han J W, 2009. Geographic Data Mining and Knowledge Discovery. London: CRC Press.

    Google Scholar 

  • Huang Y, Shekhar S, Xiong H, 2004. Discovering colocation patterns from spatial data sets: A general approach. IEEE Transactions on Knowledge and Data Engineering, 16(12): 1472–1485.

    Article  Google Scholar 

  • Keola S, Andersson M, Hall O, 2015. Monitoring economic development from space: Using nighttime light and land cover data to measure economic growth. World Development, 66: 322–334.

    Article  Google Scholar 

  • Kong X, Xu Z, Shen G et al., 2016. Urban traffic congestion estimation and prediction based on floating car trajectory data. Future Generation Computer Systems: The International Journal of EScience, 61: 97–107.

    Article  Google Scholar 

  • Koperski K, Han J, 1995. Discovery of spatial association rules in geographic information databases. Proceedings of the 4th International Symposium on Large Spatial Databases (SSD 95), Portland, ME, USA, Aug 06–09, 1995: 47–66.

    Google Scholar 

  • Koperski K, Han J, Stefanovic N, 1998. An efficient two-step method for classification of spatial data. Proceedings of the 8th International Symposium on Spatial Data Handling (SDH’98), Vancouver, BC, Canada, July 11–15, 1998: 45–54.

    Google Scholar 

  • Lazer D, Kennedy R, King G et al., 2014. The parable of Google Flu: Traps in big data analysis. Science, 343(6176): 1203–1205.

    Article  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G, 2015. Deep learning. Nature, 521(7553): 436–444.

    Article  Google Scholar 

  • Li Deren, Cheng Tao, 1995. Knowledge discovery from GIS databases. Acta Geodaetica et Cartographica Sinica, 24(1): 37–44. (in Chinese)

    Google Scholar 

  • Li Deren, Wang Shuliang, Li Deyi et al., 2002. Theories and technologies of spatial data mining and knowledge discovery. Geomatics and Information Science of Wuhan University, 27(3): 221–233. (in Chinese)

    Google Scholar 

  • Li Deren, Wang Shuliang, Shi Wenzhong et al., 2001. On spatial data mining and knowledge discovery. Geomat-ics and Information Science of Wuhan University, 26(6): 491–499. (in Chinese)

    Google Scholar 

  • Li X, Yeh A G O, 2002. Neural-network-based cellular automata for simulating multiple land use changes using GIS. International Journal of Geographical Information Science, 16(4): 323–343.

    Article  Google Scholar 

  • Liu Yang, Liu Ronggao, 2015. Retrieval of global long-term leaf area index from LTDR AVHRR and MODIS observations. Journal of Geo-Information Science, 17(11): 1304–1312. (in Chinese)

    Google Scholar 

  • Liu Yu, 2016. Revisiting several basic geographical concepts: A social sensing perspective. Acta Geographica Sinica, 71(4): 564–575. (in Chinese)

    Google Scholar 

  • Liu Z, Ma T, Du Y et al., 2018. Mapping hourly dynamics of urban population using trajectories reconstructed from mobile phone records. Transactions in GIS, 22(2): 494–513.

    Article  Google Scholar 

  • Marr B. Big Data: Using SMART Big Data, Analytics and Metrics to Make Better Decisions and Improve Performance. Chichester, UK: John Wiley & Sons, 2015.

    Google Scholar 

  • Mayer-Schonberger V, Cukier K, 2013. Big Data: A Revolution That Will Transform How We Live, Work, and Think. London: John Murray.

    Google Scholar 

  • McMillen D P, 2004. Geographically weighted regression: The analysis of spatially varying relationships. American Journal of Agricultural Economics, 86(2): 554–556.

    Article  Google Scholar 

  • Miller H J, Goodchild M F, 2015. Data-driven geography. GeoJournal, 80(4): 449–461.

    Article  Google Scholar 

  • Moon T K, 1996. The expectation-maximization algorithm. IEEE Signal Processing Magazine, 13(6): 47–60.

    Article  Google Scholar 

  • Mutanga O, Adam E, Cho M A, 2012. High density biomass estimation for wetland vegetation using World-View-2 imagery and random forest regression algorithm. International Journal of Applied Earth Observation and Geoinformation, 18: 399–406.

    Article  Google Scholar 

  • NASA, 2017. New Night Lights Maps Open Up Possible Real-Time Applications. https://www.nasa.gov/fea-ture/goddard/2017/new-night-lights-maps-open-up-possible-real-time-applications.

    Google Scholar 

  • Newing A, Anderson B, Bahaj A B et al., 2016. The role of digital trace data in supporting the collection of population statistics: The case for smart metered electricity consumption data. Population, Space and Place, 22(8): 849–863.

    Article  Google Scholar 

  • Niemeijer D, 2002. Developing indicators for environmental policy: Data-driven and theory-driven approaches examined by example. Environmental Science & Policy, 5(2): 91–103.

    Article  Google Scholar 

  • Niu N, Liu X P, Jin H et al., 2017. Integrating multi-source big data to infer building functions. International Journal of Geographical Information Science, 31(9): 1871–1890.

    Google Scholar 

  • NOAA/National Centers for Environmental Information, 2018. Global Historical Climate Network Daily: Description. https://www.ncdc.noaa.gov/ghcn-daily-description.

    Google Scholar 

  • Novembre J, Johnson T, Bryc K et al., 2008. Genes mirror geography within Europe. Nature, 456(7218): 98–101.

    Article  Google Scholar 

  • Oliver M A, Webster R, 1990. Kriging: A method of interpolation for geographical information systems. International Journal of Geographical Information System, 4(3): 313–332.

    Article  Google Scholar 

  • Pal M, Mather P M, 2005. Support vector machines for classification in remote sensing. International Journal of Remote Sensing, 26(5): 1007–1011.

    Article  Google Scholar 

  • Pei T, Gao J, Ma T et al., 2012. Multi-scale decomposition of point process data. GeoInformatica, 16(4): 625–652.

    Article  Google Scholar 

  • Pei T, Sobolevsky S, Ratti C et al., 2014. A new insight into land use classification based on aggregated mobile phone data. International Journal of Geographical Information Science, 28(9): 1988–2007.

    Article  Google Scholar 

  • Pei Tao, Li Ting, Zhou Chenghu, 2013. Spatiotemporal point process: A new data model, analysis methodology and viewpoint for geoscientific problem. Journal of Geo-Information Science, 15(6): 793–800. (in Chinese)

    Article  Google Scholar 

  • Qian Chengcheng, Cheng Ge, 2018. Big data science for ocean: Present and future. Bulletin of Chinese Academy of Sciences, 33(8): 884–891. (in Chinese)

    Google Scholar 

  • Silver D, Huang A, Maddison C J et al., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484–489.

    Article  Google Scholar 

  • Silver D, Schrittwieser J, Simonyan K et al., 2017. Mastering the game of Go without human knowledge. Nature, 550(7676): 354–359.

    Article  Google Scholar 

  • Song C, Qu Z, Blumm N et al., 2010. Limits of predictability in human mobility. Science, 327(5968): 1018–1021.

    Article  Google Scholar 

  • Stein M L, 2012. Interpolation of Spatial Data: Some Theory for Kriging. New York: Springer Science & Business Media.

    Google Scholar 

  • Sun Zhongwei, Lu Zi, 2005. A geographical perspective to the elementary nature of space of flows. Geography and Geo-Information Science, 21(1): 109–112. (in Chinese)

    Google Scholar 

  • Tobler W R, 1970. A computer movie simulating urban growth in the Detroit region. Economic Geography, 46(Suppl.1): 234–240.

    Article  Google Scholar 

  • Vazifeh M M, Santi P, Resta G et al., 2018. Addressing the minimum fleet problem in on-demand urban mobility. Nature, 557(7706): 534–538.

    Article  Google Scholar 

  • Wang Haiqi, Wang Jinfeng, 2005. Research on progress of spatial data mining. Geography and Geo-Information Science, (4): 6–10. (in Chinese)

    Google Scholar 

  • Yuan J, Zheng Y, Xie X et al., 2013. T-Drive: Enhancing driving directions with taxi drivers’ intelligence. IEEE Transactions on Knowledge & Data Engineering, 25(1): 220–232.

    Article  Google Scholar 

  • Yuan Y, Wei G, Lu Y, 2018. Evaluating gender representativeness of location-based social media: A case study of Weibo. Annals of GIS, 24(3): 163–176.

    Article  Google Scholar 

  • Zandbergen P A, 2008. Positional accuracy of spatial data: Non-normal distributions and a critique of the national standard for spatial data accuracy. Transactions in GIS, 12(1): 103–130.

    Article  Google Scholar 

  • Zhang Wenjian, 2010. WMO integrated global observing system (WIGOS). Meteorological Monthly, 36(3): 1–8. (in Chinese)

    Google Scholar 

  • Zhao B, Zhang S, 2018. Rethinking spatial data quality: Pokémon go as a case study of location spoofing. The Professional Geographer, doi: 10.1080/00330124.2018.1479973.

    Google Scholar 

  • Zhao Z L, Shaw S L, Xu Y et al., 2016. Understanding the bias of call detail records in human mobility research. International Journal of Geographical Information Science, 30(9): 1738–1762.

    Article  Google Scholar 

  • Zheng Y, Liu Y, Yuan J et al., 2011. Urban computing with taxicabs. Proceedings of the 13th International Conference on Ubiquitous Computing, Beijing, China, September 17–21, 2011: 89–98.

    Google Scholar 

  • Zhu A-X, Qi F, Moore A et al., 2010a. Prediction of soil properties using fuzzy membership values. Geoderma, 158(3/4): 199–206.

    Article  Google Scholar 

  • Zhu A-X, Yang L, Li B et al., 2010b. Construction of membership functions for predictive soil mapping under fuzzy logic. Geoderma, 155(3/4): 164–174.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Pei.

Additional information

Foundation: National Natural Science Foundation of China, No.41525004, No.41421001

Author: Pei Tao (1972-), Professor, specialized in big geodata mining.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pei, T., Song, C., Guo, S. et al. Big geodata mining: Objective, connotations and research issues. J. Geogr. Sci. 30, 251–266 (2020). https://doi.org/10.1007/s11442-020-1726-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11442-020-1726-7

Keywords

Navigation