Getting Started with Hadoop
Apache Hadoop is a software framework that allows distributed processing of large datasets across clusters of computers using simple programming constructs/models. It is designed to scale-up from a single server to thousands of nodes. It is designed to detect failures at the application level rather than rely on hardware for high-availability thereby delivering a highly available service on top of cluster of commodity hardware nodes each of which is prone to failures . While Hadoop can be run on a single machine the true power of Hadoop is realized in its ability to scale-up to thousands of computers, each with several processor cores. It also distributes large amounts of work across the clusters efficiently .
KeywordsData Block Master Node Replication Factor Slave Node Hadoop Distribute File System
Unable to display preview. Download preview PDF.
- 1.Tom White, 2012, Hadoop: The Definitive Guide, O’reillyGoogle Scholar
- 2.Hadoop Tutorial, Yahoo Developer Network, http://developer.yahoo.com/hadoop/tutorial
- 3.Mike Cafarella and Doug Cutting, April 2004, Building Nutch: Open Source Search, ACM Queue, http://queue.acm.org/detail.cfm?id=988408.
- 4.Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leun, g, October 2003, The Google File System, http://labs.google.com/papers/gfs.html.
- 5.Jeffrey Dean and Sanjay Ghemawat, December 2004, MapReduce: Simplified Data Processing on Large Clusters, http://labs.google.com/papers/mapreduce.html
- 6.Yahoo! Launches World’s Largest Hadoop Production Application, 19 February 2008, http://developer.yahoo.net/blogs/hadoop/2008/02/yahoo-worlds-largest-productionhadoop.html.
- 7.Derek Gottfrid, 1 November 2007, Self-service, Prorated Super Computing Fun!, http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/.
- 8.Google, 21 November 2008, Sorting 1PB with MapReduce, http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html.
- 9.From Gantz et al., March 2008, The Diverse and Exploding Digital Universe, http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf
- 10.http://www.intelligententerprise.com/showArticle.jhtml?articleID=207800705, http://mashable.com/2008/10/15/facebook-10-billion-photos/, http://blog.familytreemagazine.com/insider/Inside+Ancestrycoms+TopSecret+Data+Center.aspx, and http://www.archive.org/about/faqs.php, http://www.interactions.org/cms/?pid=1027032.
- 11.David J. DeWitt and Michael Stonebraker, In January 2007 ?MapReduce: A major step backwards? http://databasecolumn.vertica.com/database-innovation/mapreduce-a-major-stepbackwards
- 12.Jim Gray, March 2003, Distributed Computing Economics, http://research.microsoft.com/apps/pubs/default.aspx?id=70001
- 13.Apache Mahout, http://mahout.apache.org/
- 14.Think Big Analytics, http://thinkbiganalytics.com/leading_big_data_dtechnologies/hadoop/
- 15.Jeffrey Dean and Sanjay Ghemawat, 2004, MapReduce: Simplified Data Processing on Large Clusters. Proc. Sixth Symposium on Operating System Design and Implementation.Google Scholar
- 16.Olston, Christopher, et al. ”Pig latin: a not-so-foreign language for data processing.” Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM, 2008.Google Scholar
- 17.Thusoo, Ashish, et al. ”Hive: a warehousing solution over a map-reduce framework.” Proceedings of the VLDB Endowment 2.2 (2009): 1626-1629.Google Scholar
- 18.George, Lars. HBase: the definitive guide. ” O’Reilly Media, Inc.”, 2011.Google Scholar
- 19.Hunt, Patrick, et al. ”ZooKeeper: Wait-free Coordination for Internet-scale Systems.” USENIX Annual Technical Conference. Vol. 8. 2010.Google Scholar
- 20.Hausenblas, Michael, and Jacques Nadeau. ”Apache drill: interactive Ad-Hoc analysis at scale.” Big Data 1.2 (2013): 100-104.Google Scholar
- 21.Borthakur, Dhruba. ”HDFS architecture guide.” HADOOP APACHE PROJECT http://hadoop.apache.org/common/docs/current/hdfs_design.pdf (2008).
- 22.[Online] IBM DeveloperWorks, http://www.ibm.com/developerworks/library/waintrohdfs/
- 23.Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler, May 2010, The Hadoop Distributed File System, Proceedings of MSST2010, http://storageconference.org/2010/Papers/MSST/Shvachko.pdf
- 24.[Online] Konstantin V. Shvachko, April 2010, HDFS Scalability: The limits to growth, pp. 6–16 http://www.usenix.org/publications/login/2010-04/openpdfs/shvachko.pdf
- 25.[Online] Micheal Noll, Single Node Cluster, http://www.michael-noll.com/tutorials/runninghadoop-on-ubuntu-linux-single-node-cluster/
- 26.[Online] Micheal Noll, Multi Node Cluster, http://www.michaelnoll.com/tutorials/runninghadoop-on-ubuntu-linux-multi-node-cluster/
- 27.[Online] Micheal Noll, Hadoop Streaming:Python, http://www.michaelnoll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
- 28.Hadoop, Apache. ”Apache Hadoop.” 2012-03-07]. http://hadoop.apache.org (2011).