Abstract
As a result of the continuing information explosion, many organizations are drowning in data and the resulting “data gap” or inability to process this information and use it effectively is increasing at an alarming rate. Data-intensive computing represents a new computing paradigm (Kouzes, Anderson, Elbert, Gorton, & Gracio, 2009) which can address the data gap using scalable parallel processing to allow government, commercial organizations, and research environments to process massive amounts of data and implement applications previously thought to be impractical or infeasible. Cloud computing provides the opportunity for organizations with limited internal resources to implement large-scale data-intensive computing applications in a cost-effective manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbas, A. (2004). Grid computing: A practical guide to technology and applications. Hingham, MA: Charles River Media.
Agichtein, E. (2005). Scaling information extraction to large document collections. IEEE Data Engineering Bulletin, 28, 3–10.
Agichtein, E., & Ganti, V. (2004). Mining reference tables for automatic text segmentation. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, 20–29.
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., et al. (2009). Above the clouds: A Berkeley view of cloud computing (University of California at Berkely, Tech. Rep. UCB/EECS-2009-28).
Berman, F. (2008). Got data? A guide to data preservation in the information age. Communications of the ACM, 51(12), 50–56.
Borthakur, D. (2008). Hadoop distributed file system. Available from: http://www.opendocs.net/apache/hadoop/HDFSDescription.pdf.
Bryant, R. E. (2008). Data intensive scalable computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.
Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009). Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Generation Computer Systems, 25(6), 599–616.
Cerf, V. G. (2007). An information avalanche. IEEE Computer, 40(1), 104–105.
Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., et al. (2008). SCOPE: Easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment, New York, NY.
Chang, F., Dean, J., Ghemawat, S., Hsieh, W. C., Wallach, D. A., Burrows, M., et al. (2006). Bigtable: A distributed storage system for structured data. Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06), Seattle, WA.
Dean, J., & Ghemawat, S. (2004). MapReduce: Simplified data processing on large clusters. Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI), Boston, MA.
Gantz, J. F., Reinsel, D., Chute, C., Schlichting, W., McArthur, J., Minton, S., et al. (2007). The expanding digital universe. IDC, White Paper.
Gates, A. F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S. M., Olston, C., et al. (2009). Building a high-level dataflow system on top of map-reduce: The pig experience. Proceedings of the 35th International Conference on Very Large Databases (VLDB 2009), Lyon, France.
Gokhale, M., Cohen, J., Yoo, A., & Miller, W. M. (2008). Hardware technologies for high-performance data-intensive computing. IEEE Computer, 41(4), 60–68.
Gorton, I., Greenfield, P., Szalay, A., & Williams, R. (2008). Data-intensive computing in the 21st century. IEEE Computer, 41(4), 30–32.
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The google file system. Proceedings of the 19th ACM Symposium on Operating Systems Principles, New York, NY.
Gray, J. (2008). Distributed computing economics. ACM Queue, 6(3), 63–68.
Grossman, R. L. (2009). The case for cloud computing. IT Professional,11(2), 23–27.
Grossman, R., & Gu, Y. (2008). Data mining using high performance data clouds: Experimental studies using sector and sphere. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY.
Grossman, R. L., & Gu, Y. (2009). On the varieties of clouds for data intensive computing. Available from: http://sites.computer.org/debull/A09mar/grossman.pdf, 2009.
Grossman, R. L., Gu, Y., Sabala, M., & Zhang, W. (2009). Compute and storage clouds using wide area high performance networks. Future Generation Computer Systems, 25(2), 179–183.
Gu, Y., & Grossman, R. L. (2009). Lessons learned from a year’s worth of benchmarks of large data clouds. Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Portland, OR.
Hayes, B. (2008). Cloud computing. Communications of the ACM, 51(7), 9–11.
Johnston,W. E. (1998). High-speed, wide area, data intensive computing: A ten year retrospective. Proceedings of the 7th IEEE International Symposium on High-Performance Distributed Computing. Chicago, Illinois, 280.
Kouzes, R. T., Anderson, G. A., Elbert, S. T., Gorton, I., & Gracio, D. K. (2009). The changing paradigm of data-intensive computing. Computer, 42(1), 26–34.
Lenk, A., Klems, M., Nimis, J., Tai, S., & Sandholm, T. (2009). What’s inside the cloud? An architectural map of the cloud landscape. Proceedings of the 2009 ICSE Workshop on Software Engineering Challenges of Cloud Computing. Vancouver, Canada, 23–31.
Levitt, N. (2009). Is cloud computing really ready for prime time? Computer, 42(1), 15–20.
Liu, H., & Orban, D. (2008). GridBatch: Cloud computing for large-scale data-intensive batch applications. Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, Cardiff.
Llor, X., Acs, B., Auvil, L. S., Capitanu, B., Welge, M. E., & Goldberg, D. E. (2008). Meandre: Semantic-driven data-intensive flows in the clouds. Proceedings of the 4th IEEE International Conference on eScience, Nottingham.
Lyman, P., & Varian, H. R. (2003). How much information? (School of Information Management and Systems, University of California at Berkeley, Research Rep.).
Mell, P., & Grance, T. (2009). The NIST definition of cloud computing. Retrieved January 5, 2010, from: http://csrc.nist.gov/groups/SNS/cloud-computing/cloud-def-v15.doc.
Napper, J., & Bientinesi, P. (2009). Can cloud computing reach the Top500?. Conference On Computing Frontiers. Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop, Ischia, Italy.
Nicosia, M. (2009). Hadoop cluster management. Retrieved January 5, 2010, from: http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/Hadoop-USENIX09.pdf.
Nyland, L. S., Prins, J. F., Goldberg, A., & Mills, P. H. (2000). A design methodology for data-parallel applications. IEEE Transactions on Software Engineering, 26(4), 293–314.
NSF. (2009). Data-intensive computing. Retrieved January 5, 2010, from: http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS.
O’Malley, O. (2008). Introduction to Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf.
O’Malley, O., & Murthy, A. C. (2009). Winning a 60 second dash with a yellow elephant. Retrieved January 5, 2010, from: http://sortbenchmark.org/Yahoo2009.pdf.
Olston, C. (2009). Pig overview presentation – Hadoop summit. Retrieved January 5, 2010, from: http://infolab.stanford.edu/∼olston/pig.pdf.
Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008a). Pig Latin: A not-so-foreign language for data processing (Presentation at SIGMOD 2008). Retrieved January 5, 2010, from: http://i.stanford.edu/∼usriv/talks/sigmod08-pig-latin.ppt#283,18,User-Code as a First-Class Citizen.
Olston, C., Reed, B., Srivastava, U., Kumar, R., & Tomkins, A. (2008b). Pig Latin: A not-so_foreign language for data processing. Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data/Principles of Database Systems, Vancouver, BC.
Pavlo, A., Paulson, E., Rasin, A., Abadi, D. J., Dewitt, D. J., Madden, S., et al. (2009). A comparison of approaches to large-scale data analysis. Proceedings of the 35th SIGMOD International Conference on Management of Data, New York, NY.
PNNL. (2008). Data intensive computing. Retrieved January 5, 2010, from: http://www.cs.cmu.edu/∼bryant/presentations/DISC-concept.ppt.
Pike, R., Dorward, S., Griesemer, R., & Quinlan, S. (2004). Interpreting the data: Parallel analysis with Sawzall. Scientific Programming Journal, 13(4), 227–298.
Ravichandran, D., Pantel, P., & Hovy, E. (2004). The terascale challenge. Proceedings of the KDD Workshop on Mining for and from the Semantic Web, Boston, MA.
Rencuzogullari, U., & Dwarkadas, S. (2001). Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, San Diego, CA, 72–81.
Reese, G. (2009). Cloud application architectures. Sebastopol, CA: O’Reilly.
Skillicorn, D. B., & Talia, D. (1998). Models and languages for parallel computation. ACM Computing Surveys, 30(2), 123–169.
Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2009). A break in the clouds: Towards a cloud definition. SIGCOMM Computer Communication Review, 39(1), 50–55.
Velte, A. T., Velte, T. J., & Elsenpeter, R. (2009). Cloud computing: A practical approach. New York, NY: McGraw Hill.
Venner, J. (2009). Pro Hadoop. New York, NY: Apress.
Viega, J. (2009). Cloud computing and the common man. Computer, 42(8), 106–108.
Weiss, A. (2007). Computing in the clouds. netWorker, 11(4), 16–25.
White, T. (2008). Understanding map reduce with Hadoop. Available from: http://wiki.apache.org/hadoop/HadoopPresentations.
White, T. (2009). Hadoop: The definitive guide. Sebastopol, CA: O’Reilly Media.
Yu, Y., Gunda, P. K., & Isard, M. (2009). Distributed aggregation for data-parallel computing: Interfaces and implementations. Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Middleton, A.M. (2010). Data-Intensive Technologies for Cloud Computing. In: Furht, B., Escalante, A. (eds) Handbook of Cloud Computing. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6524-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6524-0_5
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6523-3
Online ISBN: 978-1-4419-6524-0
eBook Packages: Computer ScienceComputer Science (R0)