Skip to main content

The HPCC/ECL Platform for Big Data

Abstract

As a result of the continuing information explosion, many organizations are experiencing what is now called the “Big Data” problem. This results in the inability of organizations to effectively use massive amounts of their data in datasets which have grown to big to process in a timely manner. Data-intensive computing represents a new computing paradigm [26] which can address the big data problem using high-performance architectures supporting scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement new applications previously thought to be impractical or infeasible.

Keywords

  • Query Processing
  • Execution Environment
  • Slave Node
  • Distribute File System
  • Declarative Language

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-44550-2_6
  • Chapter length: 25 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-44550-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.99
Price excludes VAT (USA)
Hardcover Book
USD   159.99
Price excludes VAT (USA)
Fig. 6.1
Fig. 6.2
Fig. 6.3
Fig. 6.4
Fig. 6.5
Fig. 6.6
Fig. 6.7
Fig. 6.8
Fig. 6.9

References

  1. Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK. The changing paradigm of data-intensive computing. Computer. 2009;42(1):26–34.

    CrossRef  Google Scholar 

  2. Gorton I, Greenfield P, Szalay A, Williams R. Data-intensive computing in the 21st century. IEEE Comput. 2008;41(4):30–2.

    CrossRef  Google Scholar 

  3. Johnston WE. High-speed, wide area, data intensive computing: a ten year retrospective. In: Proceedings of the 7th IEEE international symposium on high performance distributed computing: IEEE Computer Society; 1998.

    Google Scholar 

  4. Skillicorn DB, Talia D. Models and languages for parallel computation. ACM Comput Surv. 1998;30(2):123–69.

    CrossRef  Google Scholar 

  5. Dowd K, Severance C. High performance computing. Sebastopol: O’Reilly and Associates Inc.; 1998.

    Google Scholar 

  6. Abbas A. Grid computing: a practical guide to technology and applications. Hingham: Charles River Media Inc; 2004.

    Google Scholar 

  7. Gokhale M, Cohen J, Yoo A, Miller WM. Hardware technologies for high-performance data-intensive computing. IEEE Comput. 2008;41(4):60–8.

    CrossRef  Google Scholar 

  8. Nyland LS, Prins JF, Goldberg A, Mills PH. A design methodology for data-parallel applications. IEEE Trans Softw Eng. 2000;26(4):293–314.

    CrossRef  Google Scholar 

  9. Agichtein E, Ganti V. Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA; 2004. p. 20–9.

    Google Scholar 

  10. Agichtein E. Scaling information extraction to large document collections: Microsoft Research. 2004.

    Google Scholar 

  11. Rencuzogullari U, Dwarkadas S. Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. In: Proceedings of the eighth ACM SIGPLAN symposium on principles and practices of parallel programming, Snowbird, UT; 2001. p. 72–81.

    Google Scholar 

  12. Cerf VG. An information avalanche. IEEE Comput. 2007;40(1):104–5.

    CrossRef  Google Scholar 

  13. Gantz JF, Reinsel D, Chute C, Schlichting W, McArthur J, Minton S, et al. The expanding digital universe (White Paper): IDC. 2007.

    Google Scholar 

  14. Lyman P, Varian HR. How much information? 2003 (Research Report). School of Information Management and Systems, University of California at Berkeley; 2003.

    Google Scholar 

  15. Berman F. Got data? A guide to data preservation in the information age. Commun ACM. 2008;51(12):50–6.

    CrossRef  Google Scholar 

  16. NSF. Data-intensive computing. National Science Foundation. 2009. http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS. Retrieved 10 Aug 2009.

  17. PNNL. Data intensive computing. Pacific Northwest National Laboratory. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.

  18. Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst. 2009;25(6):599–616.

    CrossRef  Google Scholar 

  19. Gray J. Distributed computing economics. ACM Queue. 2008;6(3):63–8.

    CrossRef  Google Scholar 

  20. Bryant RE. Data intensive scalable computing. Carnegie Mellon University. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.

  21. Middleton AM. Data-intensive computing solutions (Whitepaper): LexisNexis. 2009.

    Google Scholar 

  22. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the sixth symposium on operating system design and implementation (OSDI); 2004.

    Google Scholar 

  23. Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–7.

    CrossRef  Google Scholar 

  24. Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with sawzall. Sci Program J. 2004;13(4):227–98.

    Google Scholar 

  25. White T. Hadoop: the definitive guide. 1st ed. Sebastopol: O’Reilly Media Inc; 2009.

    Google Scholar 

  26. Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, et al. Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proceedings of the 35th international conference on very large databases (VLDB 2009), Lyon, France; 2009.

    Google Scholar 

  27. Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so_foreign language for data processing. In: Proceedings of the 28th ACM SIGMOD/PODS international conference on management of data/principles of database systems, Vancouver, BC, Canada; 2008. p. 1099–110.

    Google Scholar 

  28. Bayliss DA. Enterrprise control language overview (Whitepaper): LesisNexis. 2010b.

    Google Scholar 

  29. Bayliss DA. Thinking declaratively (Whitepaper). 2010c.

    Google Scholar 

  30. Hellerstein JM. The declarative imperative. SIGMOD Rec. 2010;39(1):5–19.

    CrossRef  Google Scholar 

  31. O’Malley O. Introduction to hadoop. 2008. http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf. Retrieved 10 Aug 2009.

  32. Bayliss DA. Aggregated data analysis: the paradigm shift (Whitepaper): LexisNexis. 2010a.

    Google Scholar 

  33. Buyya R. High performance cluster computing. Upper Saddle River: Prentice Hall; 1999.

    Google Scholar 

  34. Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, et al. Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow. 2008;1:1265–76.

    CrossRef  Google Scholar 

  35. Grossman R, Gu Y. Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA; 2008.

    Google Scholar 

  36. Grossman RL, Gu Y, Sabala M, Zhang W. Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst. 2009;25(2):179–83.

    CrossRef  Google Scholar 

  37. Gu Y, Grossman RL. Lessons learned from a year’s worth of benchmarks of large data clouds. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers, Portland, Oregon; 2009.

    Google Scholar 

  38. Liu H, Orban D. Gridbatch: cloud computing for large-scale data-intensive batch applications. In: Proceedings of the eighth IEEE international symposium on cluster computing and the grid; 2008. p. 295–305.

    Google Scholar 

  39. Llor X, Acs B, Auvil LS, Capitanu B, Welge ME, Goldberg DE. Meandre: semantic-driven data-intensive flows in the clouds. In: Proceedings of the fourth IEEE international conference on eScience; 2008. p. 238–245.

    Google Scholar 

  40. Pavlo A, Paulson E, Rasin A, Abadi DJ, Dewitt DJ, Madden S, et al. A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data, Providence, RI; 2009. p. 165–68.

    Google Scholar 

  41. Ravichandran D, Pantel P, Hovy E. The terascale challenge. In: Proceedings of the KDD workshop on mining for and from the semantic web; 2004.

    Google Scholar 

  42. Yu Y, Gunda PK, Isard M. Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, Big Sky, Montana, USA; 2009. p. 247–60.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Middleton, A.M., Bayliss, D.A., Halliday, G., Chala, A., Furht, B. (2016). The HPCC/ECL Platform for Big Data. In: Big Data Technologies and Applications. Springer, Cham. https://doi.org/10.1007/978-3-319-44550-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44550-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44548-9

  • Online ISBN: 978-3-319-44550-2

  • eBook Packages: Computer ScienceComputer Science (R0)