Creating Large Size of Data with Apache Hadoop

Růžička, Jan; Kocich, David; Orčík, Lukáš; Svozilík, Vladislav

doi:10.1007/978-3-319-45123-7_22

Jan Růžička⁹,
David Kocich⁹,
Lukáš Orčík¹⁰ &
…
Vladislav Svozilík⁹

Part of the book series: Lecture Notes in Geoinformation and Cartography ((LNGC))

1815 Accesses

Abstract

The paper is focused on research in the area of building large datasets using Apache Hadoop. Our team is managing an information system that is able to calculate probability of existence of different objects in space and time. The system works with a lot of different data sources, including large datasets. The workflow of data processing is quite complicated and time consuming, so we were looking for some framework that could help with system management and, if possible, to speed up data processing as well. Apache Hadoop was selected as a platform for enhance our information system. Apache Hadoop is usually used for processing large datasets, but in a case of our information system is necessary to perform other types of tasks as well. The systems computes spatio-temporal relations between different types of objects. This means that from relatively small amount of records (thousands) are built relatively large datasets (millions of records). For this purposes is usually used PostgreSQL/PostGIS database or tools written in Java or other language. Our research was focused to determination if we could simply move some of this tasks to Apache Hadoop platform using simple SQL editor like Hive. We have selected two types of common tasks and tested them on PostgreSQL and Apache Hadoop (Hive) platform to be able compare time necessary to complete these tasks. The paper presents results of our research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cloudera (2015) http://www.cloudera.com/content/www/en-us/downloads.html
COSMC (2015) Registry of territorial identification, addresses and real estate. http://www.cuzk.cz/ruian/RUIAN.aspx
COSMC (2016) COSMC download or view services. http://geoportal.cuzk.cz
Eldawy A, Mokbel M (2013) SpatialHadoop. http://spatialhadoop.cs.umn.edu/. Accessed 5 Jan 2016
ESRI (2016) Esri/geoprocessing-tools-for-hadoop. https://github.com/Esri/geoprocessing-tools-for-hadoop. Accessed 5 Jan 2016
The Postgresql Global Development Group (2015) Performance optimization—PostgreSQL wiki. https://wiki.postgresql.org/wiki/Performance_Optimization. Accessed 5 Jan 2016
Wang K, Han J, Tu B, Dai J, Zhou W, Song X (2010) Accelerating Spatial data processing with mapreduce, parallel and distributed systems (ICPADS). http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5695607&tag=1

Download references

Acknowledgments

Supported by grant from Student Grant Competition, FMG, VSB-TUO. We would like to thank to all open source developers.

Author information

Authors and Affiliations

Institute of Geoinformatics, Faculty of Mining and Geology, VŠB-Technical University of Ostrava, 17. listopadu 15/2172, 708 33, Ostrava-Poruba, Czech Republic
Jan Růžička, David Kocich & Vladislav Svozilík
Department of Telecommunication, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, 17. listopadu 15, 708 33, Ostrava, Czech Republic
Lukáš Orčík

Authors

Jan Růžička
View author publications
You can also search for this author in PubMed Google Scholar
David Kocich
View author publications
You can also search for this author in PubMed Google Scholar
Lukáš Orčík
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav Svozilík
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Růžička .

Editor information

Editors and Affiliations

Institute of Geoinformatics, VŠB-Technical University of Ostrava, Ostrava, Moravskoslezsky, Czech Republic
Igor Ivan
Department of Geography and Planning, University of Liverpool, Liverpool, Merseyside, United Kingdom
Alex Singleton
Institute of Geoinformatics, VŠB-Technical University of Ostrava, Ostrava, Moravskoslezsky, Czech Republic
Jiří Horák
Institute of Geoinformatics, VŠB-Technical University of Ostrava, Ostrava, Moravskoslezsky, Czech Republic
Tomáš Inspektor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Růžička, J., Kocich, D., Orčík, L., Svozilík, V. (2017). Creating Large Size of Data with Apache Hadoop. In: Ivan, I., Singleton, A., Horák, J., Inspektor, T. (eds) The Rise of Big Spatial Data. Lecture Notes in Geoinformation and Cartography. Springer, Cham. https://doi.org/10.1007/978-3-319-45123-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-45123-7_22
Published: 15 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45122-0
Online ISBN: 978-3-319-45123-7
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics