Distributed File System
The main objective of this chapter is to provide information and guidance for building a Hadoop distributed file system to address the big data classification problem. This system can help one to implement, test, and evaluate various machine-learning techniques presented in this book for learning purposes. The objectives include a detailed explanation of the Hadoop framework and the Hadoop system, the presentation of the Internet resources that can help you build a virtual machine-based Hadoop distributed file system with the R programming platform, and the establishment of an easy-to-follow, step-by-step instruction to build the RevolutionAnalytics’ RHadoop system for your big data computing environment. The objective also includes the presentation of simple examples to test the system to ensure the Hadoop system works. A brief discussion on setting up a multi node Hadoop system is also presented.
I would like to thank my graduate student Sumanth Reddy Yanala for helping to produce the drawing in Fig. 4.1. The information and discussions on “wrapletters” available at http://www.latex-community.org/forum/viewtopic.php?f=44&t=3798 helped the formatting of several long continuous text, like Uniform Resource Locator (URL), in this book.
- 1.T. White. “Hadoop: the definitive guide.” O’Reilly Inc, 2009.Google Scholar
- 3.D. Borthakur. “The hadoop distributed file system: Architecture and design.” Hadoop Project Website 11: 21, 2007.Google Scholar
- 4.K. Shvachko, H. Kuang, S. Radia, and R. Chansler. “The hadoop distributed file system.” In Proceedings of the IEEE Symposium on Mass Storage Systems and Technologies, pp. 1–10, 2010.Google Scholar