Scaling of Complex Calculations over Big Data-Sets

Grzegorowski, Marek

doi:10.1007/978-3-319-09912-5_7

Scaling of Complex Calculations over Big Data-Sets

Marek Grzegorowski¹⁹

Conference paper

2338 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8610))

Abstract

This article introduces a novel approach to scale complex calculations in extensive IT infrastructures and presents significant case studies in SONCA and DISESOR projects. Described system is enabling parallelism of calculations by providing dynamic data sharding without necessity of direct integration with storage repositories. Presented solution doesn’t require to complete a single phase of processing before starting the next one, hence it is suitable for supporting many dependent calculations and can be used to provide scalability and robustness of whole data processing pipelines. Introduced mechanism is designed to support case of still emerging data, thereby it is suitable for data streams e.g. transformation and analysis of data collected from multiple sensors. As will be shown in this article, this approach scales well and is very attractive because can be easily applied to data processing between heterogeneous systems.

This research was partly supported by Polish National Science Centre (NCN) grant DEC-2011/01/B/ST6/03867, as well as Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 in frame of Applied Research Programmes. This publication has been co-financed with the European Union funds by the European Social Fund.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.): Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390. Springer, Heidelberg (2012)
Google Scholar
Bennett, K., Layzell, P., Budgen, D., Brereton, P., Macaulay, L., Munro, M.: Service-based software: The future for flexible software. In: Proceedings of the Seventh Asia-Pacific Software Engineering Conference. IEEE Computer Society, Washington, DC (2000), http://dl.acm.org/citation.cfm?id=580763.785797
Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ansi sql isolation levels. SIGMOD Rec. 24(2), 1–10 (1995), http://doi.acm.org/10.1145/568271.223785
Article Google Scholar
Boniewicz, A., Wiśniewski, P., Stencel, K.: On redundant data for faster recursive querying via orm systems. In: FedCSIS, pp. 1439–1446 (2013)
Google Scholar
Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 335–350. USENIX Association, Berkeley (2006), http://dl.acm.org/citation.cfm?id=1298455.1298487
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004), http://dl.acm.org/citation.cfm?id=1251254.1251264
Google Scholar
DeWitt, D., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992), http://doi.acm.org/10.1145/129888.129894
Article Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003), http://doi.acm.org/10.1145/1165389.945450
Article Google Scholar
Grzegorowski, M., Pardel, P.W., Stawicki, S., Stencel, K.: Sonca: Scalable semantic processing of rapidly growing document stores. In: ADBIS Workshops, pp. 89–98 (2012)
Google Scholar
Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3-4), 319–336 (2012)
MATH MathSciNet Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York (2010), http://doi.acm.org/10.1145/1807167.1807184
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Book MATH Google Scholar
Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. (2011)
Google Scholar
Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 1–15. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1924943.1924961
Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012), http://www.amazon.de/Mining-Massive-Datasets-Anand-Rajaraman/dp/1107015359/ref=sr_1_1?ie=UTF8&qid=1350890245&sr=8-1
Google Scholar
Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic analytics of PubMed content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)
Chapter Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8) (August 1990), http://doi.acm.org/10.1145/79173.79181

Download references

Author information

Authors and Affiliations

Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, ul. Banacha 2, 02-097, Warsaw, Poland
Marek Grzegorowski

Authors

Marek Grzegorowski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Warsaw and Infobright Inc., Poland
Dominik Ślȩzak
Department of Computer Science, Loughborough University, Loughborough, U.K.
Gerald Schaefer
Computer Science Department, University of British Columbia, 2366 Main Mall, P.O. Box, Vancouver, B.C., Canada
Son T. Vuong
Department of Information & Communication Engineering, Inha University, Korea
Yoo-Sung Kim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grzegorowski, M. (2014). Scaling of Complex Calculations over Big Data-Sets. In: Ślȩzak, D., Schaefer, G., Vuong, S.T., Kim, YS. (eds) Active Media Technology. AMT 2014. Lecture Notes in Computer Science, vol 8610. Springer, Cham. https://doi.org/10.1007/978-3-319-09912-5_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-09912-5_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09911-8
Online ISBN: 978-3-319-09912-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics