Kouzes RT, Anderson GA, Elbert ST, Gorton I, Gracio DK. The changing paradigm of data-intensive computing. Computer. 2009;42(1):26–34.
CrossRef
Google Scholar
Gorton I, Greenfield P, Szalay A, Williams R. Data-intensive computing in the 21st century. IEEE Comput. 2008;41(4):30–2.
CrossRef
Google Scholar
Johnston WE. High-speed, wide area, data intensive computing: a ten year retrospective. In: Proceedings of the 7th IEEE international symposium on high performance distributed computing: IEEE Computer Society; 1998.
Google Scholar
Skillicorn DB, Talia D. Models and languages for parallel computation. ACM Comput Surv. 1998;30(2):123–69.
CrossRef
Google Scholar
Dowd K, Severance C. High performance computing. Sebastopol: O’Reilly and Associates Inc.; 1998.
Google Scholar
Abbas A. Grid computing: a practical guide to technology and applications. Hingham: Charles River Media Inc; 2004.
Google Scholar
Gokhale M, Cohen J, Yoo A, Miller WM. Hardware technologies for high-performance data-intensive computing. IEEE Comput. 2008;41(4):60–8.
CrossRef
Google Scholar
Nyland LS, Prins JF, Goldberg A, Mills PH. A design methodology for data-parallel applications. IEEE Trans Softw Eng. 2000;26(4):293–314.
CrossRef
Google Scholar
Agichtein E, Ganti V. Mining reference tables for automatic text segmentation. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA; 2004. p. 20–9.
Google Scholar
Agichtein E. Scaling information extraction to large document collections: Microsoft Research. 2004.
Google Scholar
Rencuzogullari U, Dwarkadas S. Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations. In: Proceedings of the eighth ACM SIGPLAN symposium on principles and practices of parallel programming, Snowbird, UT; 2001. p. 72–81.
Google Scholar
Cerf VG. An information avalanche. IEEE Comput. 2007;40(1):104–5.
CrossRef
Google Scholar
Gantz JF, Reinsel D, Chute C, Schlichting W, McArthur J, Minton S, et al. The expanding digital universe (White Paper): IDC. 2007.
Google Scholar
Lyman P, Varian HR. How much information? 2003 (Research Report). School of Information Management and Systems, University of California at Berkeley; 2003.
Google Scholar
Berman F. Got data? A guide to data preservation in the information age. Commun ACM. 2008;51(12):50–6.
CrossRef
Google Scholar
NSF. Data-intensive computing. National Science Foundation. 2009. http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS. Retrieved 10 Aug 2009.
PNNL. Data intensive computing. Pacific Northwest National Laboratory. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.
Buyya R, Yeo CS, Venugopal S, Broberg J, Brandic I. Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst. 2009;25(6):599–616.
CrossRef
Google Scholar
Gray J. Distributed computing economics. ACM Queue. 2008;6(3):63–8.
CrossRef
Google Scholar
Bryant RE. Data intensive scalable computing. Carnegie Mellon University. 2008. http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt. Retrieved 10 Aug 2009.
Middleton AM. Data-intensive computing solutions (Whitepaper): LexisNexis. 2009.
Google Scholar
Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the sixth symposium on operating system design and implementation (OSDI); 2004.
Google Scholar
Dean J, Ghemawat S. Mapreduce: a flexible data processing tool. Commun ACM. 2010;53(1):72–7.
CrossRef
Google Scholar
Pike R, Dorward S, Griesemer R, Quinlan S. Interpreting the data: parallel analysis with sawzall. Sci Program J. 2004;13(4):227–98.
Google Scholar
White T. Hadoop: the definitive guide. 1st ed. Sebastopol: O’Reilly Media Inc; 2009.
Google Scholar
Gates AF, Natkovich O, Chopra S, Kamath P, Narayanamurthy SM, Olston C, et al. Building a high-level dataflow system on top of map-reduce: the pig experience. In: Proceedings of the 35th international conference on very large databases (VLDB 2009), Lyon, France; 2009.
Google Scholar
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A. Pig latin: a not-so_foreign language for data processing. In: Proceedings of the 28th ACM SIGMOD/PODS international conference on management of data/principles of database systems, Vancouver, BC, Canada; 2008. p. 1099–110.
Google Scholar
Bayliss DA. Enterrprise control language overview (Whitepaper): LesisNexis. 2010b.
Google Scholar
Bayliss DA. Thinking declaratively (Whitepaper). 2010c.
Google Scholar
Hellerstein JM. The declarative imperative. SIGMOD Rec. 2010;39(1):5–19.
CrossRef
Google Scholar
O’Malley O. Introduction to hadoop. 2008. http://wiki.apache.org/hadoop-data/attachments/HadoopPresentations/attachments/YahooHadoopIntro-apachecon-us-2008.pdf. Retrieved 10 Aug 2009.
Bayliss DA. Aggregated data analysis: the paradigm shift (Whitepaper): LexisNexis. 2010a.
Google Scholar
Buyya R. High performance cluster computing. Upper Saddle River: Prentice Hall; 1999.
Google Scholar
Chaiken R, Jenkins B, Larson P-A, Ramsey B, Shakib D, Weaver S, et al. Scope: easy and efficient parallel processing of massive data sets. Proc VLDB Endow. 2008;1:1265–76.
CrossRef
Google Scholar
Grossman R, Gu Y. Data mining using high performance data clouds: experimental studies using sector and sphere. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA; 2008.
Google Scholar
Grossman RL, Gu Y, Sabala M, Zhang W. Compute and storage clouds using wide area high performance networks. Future Gener Comput Syst. 2009;25(2):179–83.
CrossRef
Google Scholar
Gu Y, Grossman RL. Lessons learned from a year’s worth of benchmarks of large data clouds. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers, Portland, Oregon; 2009.
Google Scholar
Liu H, Orban D. Gridbatch: cloud computing for large-scale data-intensive batch applications. In: Proceedings of the eighth IEEE international symposium on cluster computing and the grid; 2008. p. 295–305.
Google Scholar
Llor X, Acs B, Auvil LS, Capitanu B, Welge ME, Goldberg DE. Meandre: semantic-driven data-intensive flows in the clouds. In: Proceedings of the fourth IEEE international conference on eScience; 2008. p. 238–245.
Google Scholar
Pavlo A, Paulson E, Rasin A, Abadi DJ, Dewitt DJ, Madden S, et al. A comparison of approaches to large-scale data analysis. In: Proceedings of the 35th SIGMOD international conference on management of data, Providence, RI; 2009. p. 165–68.
Google Scholar
Ravichandran D, Pantel P, Hovy E. The terascale challenge. In: Proceedings of the KDD workshop on mining for and from the semantic web; 2004.
Google Scholar
Yu Y, Gunda PK, Isard M. Distributed aggregation for data-parallel computing: interfaces and implementations. In: Proceedings of the ACM SIGOPS 22nd symposium on operating systems principles, Big Sky, Montana, USA; 2009. p. 247–60.
Google Scholar