Abstract
This chapter presents novel solutions for storage and querying of large knowledge graphs, represented in RDF, which consist of mobility data. Such knowledge graphs are generated and updated daily based on incoming positional information of moving entities, possibly linked with contextual information and weather data. To cope with the massive size of knowledge graphs, several challenges need to be addressed related to distributed storage and parallel query processing. This chapter presents the design and implementation of a parallel processing engine for spatiotemporal RDF data built on top of Apache Spark. The engine is comprised of a storage layer, which stores deliberately encoded spatiotemporal RDF triples and a dictionary of mappings between integer identifiers and RDF resources, and also uses Property tables and columnar storage layout for improved performance. Also, the engine uses a processing layer, which is comprised by a query parsing component, a logical query builder, and a physical query constructor in order to produce execution plans that efficiently handle spatiotemporal constraints along with SPARQL processing. The performance of our engine is demonstrated by means of experiments over large knowledge graphs of real-life mobility data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abdelaziz, I., Harbi, R., Khayyat, Z., Kalnis, P.: A survey and experimental comparison of distributed SPARQL engines for very large RDF data. Proc. VLDB Endowment 10(13), 2049–2060 (2017)
Bereta, K., Smeros, P., Koubarakis, M.: Representation and querying of valid time of triples in linked geospatial data. In: The Semantic Web: Semantics and Big Data, Proceedings of 10th International Conference, ESWC 2013, Montpellier, 26–30 May 2013, pp. 259–274 (2013)
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, IN, 6–10 June 2010, pp. 975–986 (2010). https://doi.org/10.1145/1807167.1807273
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of 6th Symposium on Operating Systems Design and Implementation, pp. 137–149 (2004). https://doi.org/10.1145/1327452.1327492
Doulkeridis, C., Nørvåg, K.: A survey of large-scale analytical query processing in MapReduce. VLDB J. 23(3), 355–380 (2014)
Garbis, G., Kyzirakos, K., Koubarakis, M.: Geographica: a benchmark for geospatial RDF stores (long version). In: International Semantic Web Conference, pp. 343–359. Springer, Berlin (2013)
Kaoudi, Z., Manolescu, I.: RDF in the clouds: a survey. VLDB J. 24(1), 67–91 (2015)
Koubarakis, M., Kyzirakos, K.: Modeling and querying metadata in the semantic sensor web: the model sTRDF and the query language stSPARQL. In: The Semantic Web: Research and Applications, Proceedings of 7th Extended Semantic Web Conference, ESWC 2010, Heraklion, Crete, 30 May–3 June 2010, Part I, pp. 425–439 (2010)
Koubarakis, M., Karpathiotakis, M., Kyzirakos, K., Nikolaou, C., Sioutis, M.: Data models and query languages for linked geospatial data. In: Reasoning Web. Semantic Technologies for Advanced Query Answering - Proceedings of 8th International Summer School 2012, Vienna, 3–8 Sept 2012, pp. 290–328 (2012). https://doi.org/10.1007/978-3-642-33158-9_8
Kyzirakos, K., Karpathiotakis, M., Bereta, K., Garbis, G., Nikolaou, C., Smeros, P., Giannakopoulou, S., Dogani, K., Koubarakis, M.: The spatiotemporal RDF store Strabon. In: Proceedings of SSTD, pp. 496–500 (2013)
Lim, H., Han, D., Andersen, D.G., Kaminsky, M.: MICA: a holistic approach to fast in-memory key-value storage. In: Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014, Seattle, WA, 2–4 April 2014, pp. 429–444 (2014). https://www.usenix.org/conference/nsdi14/technical-sessions/presentation/lim
Liu, Q., Yuan, H.: A high performance memory key-value database based on Redis. J. Comput. 14(3), 170–183 (2019). http://www.jcomputers.us/index.php?m=content&c=index&a=show&catid=209&id=2925
Naacke, H., Amann, B., Curé, O.: SPARQL graph pattern processing with apache spark. In: Proceedings of the Fifth International Workshop on Graph Data-management Experiences & Systems, GRADES@SIGMOD/PODS 2017, Chicago, IL, 14–19 May 2017, pp. 1:1–1:7 (2017)
Nikitopoulos, P., Vlachou, A., Doulkeridis, C., Vouros, G.A.: Parallel and scalable processing of spatio-temporal rdf queries using spark. GeoInformatica (2019). https://doi.org/10.1007/s10707-019-00371-0
Santipantakis, G.M., Vouros, G.A., Doulkeridis, C., Vlachou, A., Andrienko, G.L., Andrienko, N.V., Fuchs, G., Garcia, J.M.C., Martinez, M.G.: Specification of semantic trajectories supporting data transformations for analytics: the datAcron ontology. In: Proceedings of the 13th International Conference on Semantic Systems, SEMANTICS 2017, Amsterdam, 11–14 Sept 2017, pp. 17–24 (2017)
Schätzle, A., Przyjaciel-Zablocki, M., Berberich, T., Lausen, G.: S2X: graph-parallel querying of RDF with GraphX. In: Biomedical Data Management and Graph Online Querying - VLDB 2015 Workshops, Big-O(Q) and DMAH, Waikoloa, HI, 31 Aug–4 Sept 2015, Revised Selected Papers, pp. 155–168 (2015)
Schätzle, A., Przyjaciel-Zablocki, M., Skilevic, S., Lausen, G.: S2RDF: RDF querying with SPARQL on spark. Proc. VLDB Endowment 9(10), 804–815 (2016)
Shi, J., Qiu, Y., Minhas, U.F., Jiao, L., Wang, C., Reinwald, B., Özcan, F.: Clash of the Titans: MapReduce vs. spark for large scale data analytics. Proc. VLDB Endowment 8(13), 2110–2121 (2015)
Vlachou, A., Doulkeridis, C., Glenis, A., Santipantakis, G.M., Vouros, G.A.: Efficient spatio-temporal RDF query processing in large dynamic knowledge bases. In: Proceedings of the 34th Annual ACM Symposium on Applied Computing, SAC 2019, Limassol, 08–12 April 2019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nikitopoulos, P., Koutroumanis, N., Vlachou, A., Doulkeridis, C., Vouros, G.A. (2020). Distributed Storage of Large Knowledge Graphs with Mobility Data. In: Vouros, G., et al. Big Data Analytics for Time-Critical Mobility Forecasting. Springer, Cham. https://doi.org/10.1007/978-3-030-45164-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-45164-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-45163-9
Online ISBN: 978-3-030-45164-6
eBook Packages: Computer ScienceComputer Science (R0)