Skip to main content

Scaling of Complex Calculations over Big Data-Sets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8610))

Abstract

This article introduces a novel approach to scale complex calculations in extensive IT infrastructures and presents significant case studies in SONCA and DISESOR projects. Described system is enabling parallelism of calculations by providing dynamic data sharding without necessity of direct integration with storage repositories. Presented solution doesn’t require to complete a single phase of processing before starting the next one, hence it is suitable for supporting many dependent calculations and can be used to provide scalability and robustness of whole data processing pipelines. Introduced mechanism is designed to support case of still emerging data, thereby it is suitable for data streams e.g. transformation and analysis of data collected from multiple sensors. As will be shown in this article, this approach scales well and is very attractive because can be easily applied to data processing between heterogeneous systems.

This research was partly supported by Polish National Science Centre (NCN) grant DEC-2011/01/B/ST6/03867, as well as Polish National Centre for Research and Development (NCBiR) grant PBS2/B9/20/2013 in frame of Applied Research Programmes. This publication has been co-financed with the European Union funds by the European Social Fund.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.): Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390. Springer, Heidelberg (2012)

    Google Scholar 

  2. Bennett, K., Layzell, P., Budgen, D., Brereton, P., Macaulay, L., Munro, M.: Service-based software: The future for flexible software. In: Proceedings of the Seventh Asia-Pacific Software Engineering Conference. IEEE Computer Society, Washington, DC (2000), http://dl.acm.org/citation.cfm?id=580763.785797

  3. Berenson, H., Bernstein, P., Gray, J., Melton, J., O’Neil, E., O’Neil, P.: A critique of ansi sql isolation levels. SIGMOD Rec. 24(2), 1–10 (1995), http://doi.acm.org/10.1145/568271.223785

    Article  Google Scholar 

  4. Boniewicz, A., Wiśniewski, P., Stencel, K.: On redundant data for faster recursive querying via orm systems. In: FedCSIS, pp. 1439–1446 (2013)

    Google Scholar 

  5. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI 2006, pp. 335–350. USENIX Association, Berkeley (2006), http://dl.acm.org/citation.cfm?id=1298455.1298487

    Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation, OSDI 2004, vol. 6, p. 10. USENIX Association, Berkeley (2004), http://dl.acm.org/citation.cfm?id=1251254.1251264

    Google Scholar 

  7. DeWitt, D., Gray, J.: Parallel database systems: The future of high performance database systems. Commun. ACM 35(6), 85–98 (1992), http://doi.acm.org/10.1145/129888.129894

    Article  Google Scholar 

  8. Ghemawat, S., Gobioff, H., Leung, S.T.: The google file system. SIGOPS Oper. Syst. Rev. 37(5), 29–43 (2003), http://doi.acm.org/10.1145/1165389.945450

    Article  Google Scholar 

  9. Grzegorowski, M., Pardel, P.W., Stawicki, S., Stencel, K.: Sonca: Scalable semantic processing of rapidly growing document stores. In: ADBIS Workshops, pp. 89–98 (2012)

    Google Scholar 

  10. Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3-4), 319–336 (2012)

    MATH  MathSciNet  Google Scholar 

  11. Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York (2010), http://doi.acm.org/10.1145/1807167.1807184

  12. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  13. Ozsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. (2011)

    Google Scholar 

  14. Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 1–15. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1924943.1924961

    Google Scholar 

  15. Rajaraman, A., Ullman, J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012), http://www.amazon.de/Mining-Massive-Datasets-Anand-Rajaraman/dp/1107015359/ref=sr_1_1?ie=UTF8&qid=1350890245&sr=8-1

    Google Scholar 

  16. Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H.S., Bazan, J.G., Skowron, A.: Semantic analytics of PubMed content. In: Holzinger, A., Simonic, K.-M. (eds.) USAB 2011. LNCS, vol. 7058, pp. 63–74. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8) (August 1990), http://doi.acm.org/10.1145/79173.79181

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Grzegorowski, M. (2014). Scaling of Complex Calculations over Big Data-Sets. In: Ślȩzak, D., Schaefer, G., Vuong, S.T., Kim, YS. (eds) Active Media Technology. AMT 2014. Lecture Notes in Computer Science, vol 8610. Springer, Cham. https://doi.org/10.1007/978-3-319-09912-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09912-5_7

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09911-8

  • Online ISBN: 978-3-319-09912-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics