Skip to main content

Big Data Movement: A Challenge in Data Processing

  • Chapter

Part of the Studies in Big Data book series (SBD,volume 9)

Abstract

This chapter discusses modern methods of data processing, especially data parallelization and data processing by bio-inspired methods. The synthesis of novel methods is performed by selected evolutionary algorithms and demonstrated on the astrophysical data sets. Such approach is now characteristic for so called Big Data and Big Analytics. First, we describe some new database architectures that support Big Data storage and processing. We also discuss selected Big Data issues, specifically the data sources, characteristics, processing, and analysis. Particular interest is devoted to parallelism in the service of data processing and we discuss this topic in detail. We show how new technologies encourage programmers to consider parallel processing not only in a distributive way (horizontal scaling), but also within each server (vertical scaling). The chapter also intensively discusses interdisciplinary intersection between astrophysics and computer science, which has been denoted astroinformatics, including a variety of data sources and examples. The last part of the chapter is devoted to selected bio-inspired methods and their application on simple model synthesis from astrophysical Big Data collections. We suggest a method how new algorithms can be synthesized by bio-inspired approach and demonstrate its application on an astronomy Big Data collection. The usability of these algorithms along with general remarks on the limits of computing are discussed at the conclusion of this chapter.

Keywords

  • Big Data
  • Big Analytics
  • Parallel processing
  • Astroinformatics
  • Bioinspired methods

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-11056-1_2
  • Chapter length: 41 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-11056-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   179.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Ahn, C.P., Alexandroff, R., Allende Prieto, C., et al.: The Tenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the SDSS-III Apache Point Observatory Galactic Evolution Experiment (2013), arXiv:1307.7735

    Google Scholar 

  • Amdahl, G.M.: Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. In: AFIPS Conference Proceedings, vol. (30), pp. 483–485 (1967), doi:10.1145/1465482.1465560.

    Google Scholar 

  • Babkin, E., Karpunina, M.: Towards application of neural networks for optimal structural synthesis of distributed database systems. In: Proceedings of 12th IEEE Int. Conf. on Electronics, Circuits and Systems, Satellite Workshop Modeling, Computation and Services, Gammarth, Tunisia, pp. 486–490 (2005)

    Google Scholar 

  • Ball, N.M., Brunner, R.M.: Data mining and machine learning in astronomy. International Journal of Modern Physics D 19(07), 1049–1107 (2010)

    CrossRef  MATH  Google Scholar 

  • Barricelli, N.A.: Esempi Numerici di processi di evoluzione. Methodos, 45–68 (1954)

    Google Scholar 

  • Barricelli, N.A.: Symbiogenetic evolution processes realized by artificial methods. Methodos 9(35-36), 143–182 (1957)

    Google Scholar 

  • Bednárek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Data-Flow Awareness in Parallel Data Processing. In: Fortino, G., Badica, C., Malgeri, M., Unland, R. (eds.) IDC 2012. SCI, vol. 446, pp. 149–154. Springer, Heidelberg (2012)

    CrossRef  Google Scholar 

  • Borkar, V., Carey, M.J., Li, C.: Inside “Big Data management”: ogres, onions, or parfaits? In: Proceedings of EDBT Conference, Berlin, Germany, pp. 3–14 (2012)

    Google Scholar 

  • Borne, K., Accomazzi, A., Bloom, J.: The Astronomy and Astrophysics Decadal Survey. Astro 2010, Position Papers, No. 6. arXiv:0909.3892 (2009)

    Google Scholar 

  • Bremermann, H.: Optimization through evolution and recombination. In: Yovits, M., Jacobi, G., Goldstine, G. (eds.) Self-Organizing Systems, pp. 93–106. Spartan Books, Washington, DC (1962)

    Google Scholar 

  • Brescia, M., Longo, G., Castellani, M., et al.: DAME: A Distributed Web Based Framework for Knowledge Discovery in Databases. Memorie della Societa Astronomica Italiana Supplementi 19, 324–329 (2012)

    Google Scholar 

  • Brescia, M., Cavuoti, S., Djorgovski, G.S., et al.: Extracting Knowledge from Mas-sive Astronomical Data Sets. In: Astrostatistics and Data Mining. Springer Series in Astro-statistics, vol. 2, pp. 31–45. Springer (2012), arXiv:1109.2840

    Google Scholar 

  • Brescia, M., Cavuoti, S., Paolillo, M., Longo, G., Puzia, T.: The detection of globular clusters in galaxies as a data mining problem. Monthly Notices of the Royal Astro-nomical Society 421(2), 1155–1165 (2012)

    CrossRef  Google Scholar 

  • Brewer, E.A.: CAP twelve years later: how the ‘rules’ have changed. Computer 45(2), 23–29 (2012)

    CrossRef  Google Scholar 

  • Cardamone, C., Schawinski, K., Sarzi, M., et al.: Galaxy Zoo Green Peas: discovery of a class of compact extremely star-forming galaxies. Monthly Notices of the Royal Astronomical Society 399(3), 1191–1205 (2009), doi:10.1111/j.1365-2966.2009.15383.x

    CrossRef  Google Scholar 

  • Cattell, R.: Scalable SQL and NoSQL Data Stores. SIGMOD Record 39(4), 12–27 (2010)

    CrossRef  Google Scholar 

  • Cavuoti, S., Brescia, M., D’Abrusco, R., Longo, G., Paolillo, M.: Photometric classification of emission line galaxies with Machine Learning methods. Monthly Notices of the Royal Astronomical Society 437(1), 968–975 (2014)

    CrossRef  Google Scholar 

  • Cavuoti, S., Garofalo, M., Brescia, M., et al.: Astrophysical data mining with GPU. A case study: genetic classification of globular clusters. New Astronomy 26, 12–22 (2014)

    CrossRef  Google Scholar 

  • D’Abrusco, R., Longo, G., Walton, N.A.: Quasar candidates selection in the Virtual Observatory era. Monthly Notices of the Royal Astronomical Society 396(1), 223–262 (2009)

    CrossRef  Google Scholar 

  • Darwin, C.: On the origin of species by means of natural selection, or the preservation of favoured races in the struggle for life, 1st edn. John Murray, London (1859)

    Google Scholar 

  • Dean, D., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)

    CrossRef  Google Scholar 

  • Djorgovski, S.G., Baltay, C., Mahabal, A.A., et al.: The Palomar-Quest digital syn-optic sky survey. Astron. Nachr. 329(3), 263–265 (2008)

    CrossRef  Google Scholar 

  • Dorogov, A.Y.: Structural synthesis of fast two-layer neural networks. Cybernetics and Systems Analysis 36(4), 512–519 (2000)

    CrossRef  MathSciNet  MATH  Google Scholar 

  • Drake, A.J., Djorgovski, S.G., Mahabal, A., et al.: First Results from the Catalina Real-time Transient Survey. Astrophys. Journal 696, 870–884 (2009)

    CrossRef  Google Scholar 

  • Flockhart, I.W., Radcliffe, N.J.: A Genetic Algorithm-Based Approach to Data Mining. In: Proceedings of 2nd Int. Conf. AAAI: Knowledge Discovery and Data Mining, Portland, Oregon, pp. 299–302 (1996)

    Google Scholar 

  • Fogel, L., Owens, J., Walsh, J.: Artificial Intelligence through Simulated Evolution. John Wiley, Chichester (1966)

    MATH  Google Scholar 

  • Gainaru, A., Slusanschi, E., Trausan-Matu, S.: Mapping data mining algorithms on a GPU architecture: A study. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds.) ISMIS 2011. LNCS, vol. 6804, pp. 102–112. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  • Gamble, M., Goble, C.: Quality, Trust and Utility of Scientific Data on the Web: Towards a Joint model. In: Proceedings of ACM WebSci 2011 Conference, Koblenz, Germany, 8 p. (2011)

    Google Scholar 

  • Gartner, Inc., Pattern-Based Strategy: Getting Value from Big Data. Gartner Group (2011), http://www.gartner.com/it/page.jsp?id=1731916 (accessed May 30, 2014)

  • Ghemawat, S., Gobioff, H., Leung, S.-L.: The Google File System. ACM SIGOPS Operating Systems Review 37(5), 29–43 (2003)

    CrossRef  Google Scholar 

  • Härder, T., Reuter, A.: Concepts for Implementing and Centralized Database Management System. In: Proceedings of Int. Computing Symposium on Application Systems Development, Nürnberg, Germany, B.G., pp. 28–104 (1983)

    Google Scholar 

  • Hey, T., Tansley, S., Tolle, K. (eds.): The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2010)

    Google Scholar 

  • Holland, J.: Adaptation in natural and artificial systems. Univ. of Michigan Press, Ann Arbor (1975)

    Google Scholar 

  • Hwu, W., Keutzer, K., Mattson, T.G.: The Concurrency Challenge. IEEE Des. Test of Computers 25(4), 312–320 (2008)

    CrossRef  Google Scholar 

  • Johnson, C.: Artificial immune systems programming for symbolic regression. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 345–353. Springer, Heidelberg (2003)

    CrossRef  Google Scholar 

  • Kaiser, N.: The Pan-STARRS Survey Telescope Project. In: Advanced Maui Optical and Space Surveillance Technologies Conference (2007)

    Google Scholar 

  • Kaiser, N., Burgett, W., Chambers, K., et al.: The pan-STARRS wide-fieldoptical/NIR imaging survey. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, vol. 7733, p. 12 (2010)

    Google Scholar 

  • Keutzer, K., Mattson, T.G.: A Design Pattern Language for Engineering (Parallel) Software. Addressing the Challenges of Tera-scale Computing. Intel Technology Journal 13(04), 6–19 (2008)

    Google Scholar 

  • Khabzaoui, M., Dhaenens, C., Talbi, E.G.: Combining Evolutionary Algorithms and Exact Approaches for Multi-Objective Knowledge Discovery. Rairo-Oper. Res. 42, 69–83 (2008), doi:10.1051/ro:2008004

    CrossRef  MathSciNet  MATH  Google Scholar 

  • Khan, M.F., Paul, R., Ahmed, I., Ghafoor, A.: Intensive data management in parallel systems: A survey. Distributed and Parallel Databases 7(4), 383–414 (1999)

    CrossRef  Google Scholar 

  • Koza, J.: Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Stanford University, Computer Science Department, Technical Report STAN-CS-90-1314 (1990)

    Google Scholar 

  • Koza, J.: Genetic programming. MIT Press (1998)

    Google Scholar 

  • Koza, J.R., Bennett, F.H., Andre, D., Keane, M.A.: Genetic Programming III; Dar-winian Invention and problem Solving. Morgan Kaufmann Publisher (1999)

    Google Scholar 

  • Koza, J., Keane, M., Streeter, M.: Evolving inventions. Scientific American 288(2), 52–59 (2003)

    CrossRef  Google Scholar 

  • Laurino, O., D’Abrusco, R., Longo, G., Riccio, G.: Monthly Notices of the Royal Astronomical Society 418, 2165–2195 (2011)

    Google Scholar 

  • Lintott, C.J., Lintott, C., Schawinski, K., Keel, W., et al.: Galaxy Zoo: ‘Hanny’s Voorwerp’, a quasar light echo? Monthly Notices of Royal Astronomical Society 399(1), 129–140 (2009)

    CrossRef  Google Scholar 

  • Lloyd, S., Giovannetti, V., Maccone, L.: Physical limits to communication. Phys. Rev. Lett. 93, 100501 (2004)

    CrossRef  Google Scholar 

  • Mahabal, A., Djorgovski, S.G., Donalek, C., Drake, A., Graham, M., Williams, R., Moghaddam, B., Turmon, M.: Classification of Optical Transients: Experiences from PQ and CRTS Surveys. In: Turon, C., Arenou, F., Meynadier, F. (eds.) Gaia: At the Frontiers of Astrometry. EAS Publ. Ser., vol. 45, EDP Sciences, Paris (2010)

    Google Scholar 

  • Maimon, O., Rokach, L.: Data Mining and Knowledge Discovery Handbook, 2nd edn. Springer (2010)

    Google Scholar 

  • Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., Byers, A.H.: Big data: the next frontier for innovation, competition, and productivity. McKinsey Global Inst. (2011)

    Google Scholar 

  • Mellier, Y., Laureijs, R., Amiaux, J., et al.: EUCLID definition study report (Euclid Red Book). European Space Agency (2011), http://sci.esa.int/euclid/48983-euclid-definition-study-report-esa-sre-2011-12 (accessed May 30, 2014)

  • Mendel, J.: Versuche uber Pflanzenhybriden Verhandlungen des naturforschenden Vereines in Brunn. Bd. IV fur das Jahr. Abhandlungen, 3–47 (1865); For the English translation, see: Druery, C.T., Bateson, W.: Experiments in plant hybridization. Journal of the Royal Horticultural Society 26, 1–32 (1901), http://www.esp.org/foundations/genetics/classical/gm-65.pdf (accessed May 30, 2014)

  • Morgan, T.P.: IDC: Big data biz worth $16.9 BILLION by 2015. The Register (2012)

    Google Scholar 

  • Mueller, R., Teubner, J., Alonso, G.: Data processing on FPGAs. Proc. VLDB Endow. 2(1), 910–921 (2009)

    CrossRef  Google Scholar 

  • O’Neill, M., Brabazon, A.: Grammatical differential evolution. In: Proceedings of International Conference on Artificial Intelligence, pp. 231–236. CSEA Press (2006)

    Google Scholar 

  • O’Neill, M., Ryan, C.: Grammatical Evolution, Evolutionary Automatic Programming in an Arbitrary Language. Springer, New York (2003)

    MATH  Google Scholar 

  • Oplatkova, Z.: Optimal trajectory of robots using symbolic regression. In: Proceedings of 56th International Astronautics Congress, Fukuoka, Japan (2005)

    Google Scholar 

  • Oplatkova, Z.: Metaevolution: Synthesis of Optimization Algorithms by means of Symbolic Regression and Evolutionary Algorithms. Lambert Academic Publishing, New York (2009)

    Google Scholar 

  • Oplatkova, Z., Zelinka, I.: Investigation on artificial ant using analytic programming. In: Proceedings of Genetic and Evolutionary Computation Conference, Seattle, WA, pp. 949–950 (2006)

    Google Scholar 

  • Oplatkova, Z., Senkerik, R., Belaskova, S., Zelinka, I.: Synthesis of control rule for synthesized chaotic system by means of evolutionary techniques. In: Proceedings of 16th International Conference on Soft Computing Mendel 2010, Technical university of Brno, Brno, Czech Republic, pp. 91–98 (2010)

    Google Scholar 

  • Oplatkova, Z., Senkerik, R., Zelinka, I., Holoska, J.: Synthesis of control law for chaotic Henon system - preliminary study. In: Proceedings of 24th European Conference on Modelling and Simulation, ECMS 2010, Kuala Lumpur, Malaysia, pp. 277–282 (2010)

    Google Scholar 

  • Oplatkova, Z., Senkerik, R., Zelinka, I., Holoska, J.: Synthesis of control law for chaotic logistic equation - preliminary study. In: IEEE Proceedings of AMS 2010, ASM, Kota Kinabalu, Borneo, Malaysia, pp. 65–70 (2010)

    Google Scholar 

  • Perryman, M.A.C.: Overview of the Gaia Mission. In: Proceedings of the Three-Dimensional Universe with Gaia, ESA SP-576, p. 15 (2005)

    Google Scholar 

  • Pokorny, J.: NoSQL Databases: a step to databases scalability in Web environment. International Journal of Web Information Systems 9(1), 69–82 (2013)

    CrossRef  Google Scholar 

  • Quinn, P., Lawrence, A., Hanisch, R.: The Management, Storage and Utilization of Astronomical Data in the 21st Century, IVOA Note (2004), http://www.ivoa.net/documents/latest/OECDWhitePaper.html (accessed May 30, 2014)

  • Raddick, J.M., Bracey, G., Gay, P.L., Lintott, C.J., Murray, P., Schawinski, K., Szalay, A.S., Vandenberg, J.: Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers. Astronomy Education Review 9(1), 010103 (2010)

    CrossRef  Google Scholar 

  • Rajaraman, A., Leskovec, J., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press (2013)

    Google Scholar 

  • Rechenberg, I.: Evolutionsstrategie - Optimierung technischer Systeme nach Prin-zipien der biologischen Evolution. PhD thesis, Printed in Fromman-Holzboog (1973)

    Google Scholar 

  • Ryan, C., Collins, J.J., O’Neill, M.: Grammatical evolution: Evolving programs for an arbitrary language. In: Banzhaf, W., Poli, R., Schoenauer, M., Fogarty, T.C. (eds.) EuroGP 1998. LNCS, vol. 1391, pp. 83–95. Springer, Heidelberg (1998)

    CrossRef  Google Scholar 

  • Schwefel, H.: Numerische Optimierung von Computer-Modellen, PhD thesis (1974), reprinted by Birkhauser (1977)

    Google Scholar 

  • Strauch, C.: NoSQL Databases. Lecture Selected Topics on Software-Technology Ultra-Large Scale Sites, Stuttgart Media University, manuscript (2011), http://www.christof-strauch.de/nosqldbs.pdf (accessed May 30, 2014)

  • Szalay, A., Gray, J.: The World Wide Telescope. Science 293, 2037–2040 (2001)

    CrossRef  Google Scholar 

  • Szalay, A.S., Gray, J., van den Berg, J.: Petabyte scale data mining: Dream or reality? In: SPIE Conference Proceedings, vol. 4836, p. 333 (2002), doi:10.1117/12.461427

    Google Scholar 

  • Tan, K.C., Teoh, E.J., Yu, Q., Goh, K.C.: A hybrid evolutionary algorithm for at-tribute selection in data mining. Expert Systems with Applications 36, 8616–8630 (2009)

    CrossRef  Google Scholar 

  • van Haarlem, M.P., Wise, M.W., Gunst, A.W., et al.: LOFAR: The LOw-Frequency Array. Astronomy and Astrophysics 556(A2), 53 (2013)

    Google Scholar 

  • Vinayak, R., Borkar, V., Carey, M.-J., Chen Li, C.: Big data platforms: what’s next? ACM Cross Road 19(1), 44–49 (2012)

    CrossRef  Google Scholar 

  • Weisser, R., Osmera, P.: Two-level transplant evolution. In: Proceedings of 17th Zittau Fuzzy Colloquium, Zittau, Germany, pp. 63–70 (2010)

    Google Scholar 

  • Weisser, R., Osmera, P.: Two-level transplant evolution for optimization of general controllers. In: New Trends in Technologies, Devices, Computer, Communication and Industrial Systems, pp. 55–68. Sciyo (2010)

    Google Scholar 

  • Weisser, R., Osmera, P., Matousek, R.: Transplant evolution with modified schema of differential evolution: Optimization structure of controllers. In: Proceedings of 16th International Conference on Soft Computing MENDEL, Brno, Czech Republic, pp. 113–120 (2010)

    Google Scholar 

  • Yadav, C., Wang, S., Kumar, M.: Algorithm and approaches to handle large Data - A Survey. IJCSN International Journal of Computer Science and Network 2(3), 37–41 (2013)

    Google Scholar 

  • Zelinka, I., Guanrong, C., Celikovsky, S.: Chaos synthesis by means of evolutionary algorithms. International Journal of Bifurcation and Chaos 18(4), 911–942 (2008)

    CrossRef  MathSciNet  MATH  Google Scholar 

  • Zelinka, I.: Analytic programming by means of new evolutionary algorithms. In: Proceedings of 1st International Conference on New Trends in Physics 2001, Brno, Czech Republic, pp. 210–214 (2001)

    Google Scholar 

  • Zelinka, I.: Analytic programming by means of soma algorithm. In: Proceedings of First International Conference on Intelligent Computing and Information Systems, Cairo, Egypt, pp. 148–154 (2002)

    Google Scholar 

  • Zelinka, I., Oplatkova, Z.: Analytic programming – comparative study. In: Proceedings of Second International Conference on Computational Intelligence, Robotics, and Autonomous Systems, Singapore (2003)

    Google Scholar 

  • Zelinka, I., Oplatkova, Z., Nolle, L.: Analytic programming – symbolic regression by means of arbitrary evolutionary algorithms. Int. J. of Simulation, Systems, Science and Technology 6(9), 44–56 (2005)

    Google Scholar 

  • Zelinka, I., Skanderova, L., Saloun, P., Senkerik, R., Pluhacek, M.: Chaos Powered Symbolic Regression in Be Stars Spectra Modeling. In: Proceedings of the ISCS 2013, Praha, pp. 131–139. Springer (2014)

    Google Scholar 

  • Zelinka, I., Celikovsky, S., Richter, H., Chen, G. (eds.): Evolutionary Algorithms and Chaotic Systems. SCI, vol. 267. Springer, Heidelberg (2010)

    MATH  Google Scholar 

  • Zelinka, I., Davendra, D., Senkerik, R., Jasek, R., Oplatkova, Z.: Analytical Program-ming - a Novel Approach for Evolutionary Synthesis of Symbolic Structures. In: Kita, E. (ed.) Evolutionary Algorithms, pp. 149–176. InTech (2011), doi:10.5772/16166

    Google Scholar 

  • Zhang, Y., Zheng, H., Zhao, Y.: Knowledge discovery in astronomical data. In: SPIE Conference Proceedings, vol. 701938, p. 108 (2008), doi:10.1117/12.788417

    Google Scholar 

  • Zhao, Y., Raicu, I., Foster, I.: Scientific workflow systems for 21st century, new bot-tle or new wine? In: Proceedings of IEEE Congress on Services - Part I, pp. 467–471 (2008)

    Google Scholar 

  • Zhao, G., Zhao, Y., Chu, Y., Jing, Y., Deng, L.: LAMOST Spectral Survey. Research in Astron. Astrophys. 12(7), 723–734 (2012)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaroslav Pokorný .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Pokorný, J. et al. (2015). Big Data Movement: A Challenge in Data Processing. In: Hassanien, A., Azar, A., Snasael, V., Kacprzyk, J., Abawajy, J. (eds) Big Data in Complex Systems. Studies in Big Data, vol 9. Springer, Cham. https://doi.org/10.1007/978-3-319-11056-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11056-1_2

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11055-4

  • Online ISBN: 978-3-319-11056-1

  • eBook Packages: EngineeringEngineering (R0)