Abstract
In the realm of survey research, establishing connections within large datasets remains a challenge. This study aims to unveil underlying connections within extensive survey data, emphasizing the need for a more integrated approach to decipher intricate relationships among survey elements. Utilizing computational semantics, machine learning, and advanced spatiotemporal models, we developed an all-encompassing database. This novel database is adept at extracting and characterizing features from a multitude of survey studies, spotlighting relationships among metadata elements such as terms, variables, and topics. The derived relationships are systematically stored as connectivity matrices. These matrices not only quantify the degree of interconnectedness among features but also provide insights into their complex interplay. As a result, our system functions akin to a digital geographical data librarian. Beyond merely serving as a storage tool, this system facilitates interdisciplinary research. It equips researchers with the capability to discern connections between survey elements, enabling them to identify the most influential paths among features based on diverse criteria. Such a tool fosters cross-disciplinary integration and unveils potential ties between seemingly unrelated survey attributes, paving the way for breakthroughs in understanding and application.
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
References
Ye, X., & Niyogi, D. (2022). Resilience of human settlements to climate change needs the convergence of urban planning and urban climate science. Computational Urban Science, 2(1), 1–4.
Liverman, D. M., & Cuesta, R. M. R. (2008). Human interactions with the Earth system: People and pixels revisited. Earth Surface Processes and Landforms: The Journal of the British Geomorphological Research Group, 33(9), 1458–1471.
Tan, J., Duan, Q., Xiao, C., He, C., & Yan, X. (2023). A brief review of the coupled human-Earth system modeling: Current state and challenges. The Anthropocene Review. https://doi.org/10.1177/20530196221149121
Lu, L., Li, P., Kalacska, M., & Robinson, B. E. (2023). Environmental impacts of renting rangelands: Integrating remote sensing and household surveys at the parcel level. Environmental Research Letters, 18(7), 074005.
Singh, K. K. (2022). Research Methodology in Social Science. KK Publications.
Lazer, D. M., Pentland, A., Watts, D. J., Aral, S., Athey, S., Contractor, N., & Wagner, C. (2020). Computational social science: Obstacles and opportunities. Science, 369(6507), 1060–1062.
Massey, D. S., & Denton, N. A. (1993). American apartheid: Segregation and the making of the underclass. Harvard University Press.
Sampson, R. J., Raudenbush, S. W., & Earls, F. (1997). Neighborhoods and violent crime: A multilevel study of collective efficacy. Science, 277, 918–924.
Massey, D. S., & Fischer, M. J. (2006). The effect of childhood segregation on minority academic performance at selective colleges. Ethnic and Racial Studies, 29, 1–26.
Charles, C. Z., Dinwiddie, G., & Massey, D. S. (2004). The continuing consequences of segregation: Family stress and college academic performance. Social Science Quarterly, 85, 1353–1373.
Cassidy, K. D., Boutsen, L., Humphreys, G. W., & Quinn, K. A. (2014). Ingroup categorization affects the structural encoding of other-race faces: Evidence from the N170 event-related potential. Social Neuroscience, 9, 235–248.
Swanson, D. R. (1986). Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspectives in biology and medicine, 30, 7–18.
Swanson, D. R. (1986). Undiscovered public knowledge. The Library Quarterly, 56, 103–118.
Swanson, D.R., & Smalheiser, N.R. (1996). Undiscovered Public Knowledge: A Ten-Year Update. KDD, pp. 295–298.
Sebastian, Y., Siew, E.-G., & Orimaye, S. O. (2017). Emerging approaches in literature-based discovery: Techniques and performance review. The Knowledge Engineering Review, 32, e12.
Hasan, K. S., & Ng, V. (2014). Automatic keyphrase extraction: A survey of the state of the art. ACL, 1, 1262–1273.
Medelyan, O., Frank, E., & Witten, I. H. (2009). Human-competitive tagging using automatic keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 3-Vol. 3, pp. 1318–1327.
Su, X., & Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in artificial intelligence, 2009, 4.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, vol. 1, pp. 448–453.
Strube, M., & Ponzetto, S. P. (2006). WikiRelate! Computing semantic relatedness using Wikipedia. AAAI, 6, 1419–1424.
Harispe, S., Ranwez, S., Janaqi, S., & Montmain, J. (2015). Semantic similarity from natural language and ontology analysis. Synthesis Lectures on Human Language Technologies, 8, 1–254.
Zhu, G., & Iglesias, C. A. (2017). Computing semantic similarity of concepts in knowledge graphs. IEEE Transactions on Knowledge and Data Engineering, 29, 72–85.
Pedersen, T., Pakhomov, S. V., Patwardhan, S., & Chute, C. G. (2007). Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40, 288–299.
Aouicha, M. B., Taieb, M. A. H., & Hamadou, A. B. (2016). SISR: System for integrting semntic reltedness nd similrity mesures. Soft Computing., 22, 1855–1879.
Beheshti, A., Yakhchi, S., Mousaeirad, S., Ghafari, S. M., Goluguri, S. R., & Edrisi, M. A. (2020). Towards cognitive recommender systems. Algorithms, 13(8), 176.
Yu, X., Ren, X., Sun, Y., Gu, Q., Sturt, B., Khandelwal, U., Norick, B., & Han, J. (2014). Personalized entity recommendation: A heterogeneous information network approach. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 283–292.
Wang, H., Wang, N., & Yeung, D.-Y. (2015). Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1244.
Chen, L., Xin, X., Wong, D., & Ding, Y. (2017). HCoM: Item-based similarity model for heterogeneous implicit feedback. In Mobile Data Management (MDM), 2017 18th IEEE International Conference On, pp. 40–49.
Liu, M., Pan, W., Liu, M., Chen, Y., Peng, X., & Ming, Z. (2017). Mixed similarity learning for recommendation with implicit feedback. Knowledge-Based Systems, 119, 178–185.
Zhu, X., Li, F., Chen, H., & Peng, Q. (2017). An efficient path computing model for measuring semantic similarity using edge and density. Knowledge and Information Systems, 55, 1–33.
Ganiz, M. C., Pottenger, W. M., & Janneck, C. D. (2005). Recent advances in literature based discovery. Technical report, LU-CSE-05-027 2005. Lehigh University, CSE Department.
Wilkowski, B., Fiszman, M., Miller, C., Hristovski, D., Arabandi, S., Rosemblat, G., & Rindflesch, T. (2011). Discovery browsing with semantic predications and graph theory. In AMIA Annual Symposium Proceedings.
Song, M., Heo, G. E., & Ding, Y. (2015). SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge. Journal of Informetrics, 9, 686–703.
Hahn-Powell, G., Valenzuela-Escárcega, M., & Surdeanu, M. (2017). Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph. In ACL, 103.
Franzoni, V., & Milani, A. (2014). Heuristic semantic walk for concept chaining in collaborative networks. International Journal of Web Information Systems, 10, 85–103.
Hogan, A. (2020). Resource description framework. In The Web of Data (pp. 59–109). Springer.
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., Hellmann, S., Morsey, M., Kleef, P., & Auer, S. (2015). DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web, 6, 167–195.
Hoffart, J., Suchanek, F. M., Berberich, K., & Weikum, G. (2013). YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 194, 28–61.
Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the ACM, 38, 39–41.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., Chandra, T., Fikes, A., & Gruber, R. E. (2008). Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems, 26, 4.
Çatalyürek, Ü. I., Aykanat, C., & Uçar, B. (2010). On two-dimensional sparse matrix partitioning: Models, methods, and a recipe. SIAM Journal on Scientific Computing, 32, 656–683.
Shang, J., Zhang, X., Liu, L., Li, S., & Han, J. (2020). Nettaxo: Automated topic taxonomy construction from text-rich network. Proceedings of the Web Conference, 2020, 1908–1919.
Abu-Salih, B. (2021). Domain-specific knowledge graphs: A survey. Journal of Network and Computer Applications, 185, 103076.
Hogan, A., Blomqvist, E., Cochez, M., d’Amato, C., Melo, G. D., Gutierrez, C., & Zimmermann, A. (2021). Knowledge graphs. ACM Computing Surveys (Csur), 54(4), 1–37.
Intrator, J., Tannen, J., & Massey, D. S. (2016). Segregation by race and income in the United States 1970–2010. Social Science Research, 60, 45–60.
Hall, A. (2014). Projecting regional change. Science, 346(6216), 1461–1462. https://doi.org/10.1126/science.aaa0629
Schelling, T. C. (1969). Models of segregation. The American Economic Review, 59, 488–493.
Patel, A., Crooks, A., & Koizumi, N. (2012). Slumulation: An agent-based modeling approach to slum formations. Journal of Artificial Societies and Social Simulation, 15(4), 2.
Funding
National Science Foundation, 2112356, 2122054, and 2232533, Xinyue Ye.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, X., Lian, X., Xu, H. et al. An integrated space–time framework for linkage discovery of big survey data. Spat. Inf. Res. 32, 195–206 (2024). https://doi.org/10.1007/s41324-023-00553-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41324-023-00553-x