Apache Hadoop is an open-source platform for storage and efficient processing of large datasets on a cluster of computers. The framework provides fault tolerance, high availability, and scalability, being able to process petabytes of data. Its principal components are MapReduce and HDFS.
Apache Hadoop is a distributed framework used to tackle Big Data. It is a software platform in a master/worker architecture with three main components: HDFS, YARN, and MapReduce. The HDFS (Hadoop Distributed File System) is an abstraction layer responsible for the storage of data. MapReduce is the data processing framework designed specifically to scale and run distributed. YARN (Yet Another Resource Negotiator) is a management platform responsible for handling resources in the cluster. Hadoop’s open-source software was written in Java and distributed under Apache license 2.0.
The Hadoop framework can be...
- Cutting D (2016) https://www.youtube.com/watch?v=Phjif53vAhM . Accessed 20 Oct 2017
- Harris D (2013) https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/ . Accessed 20 Oct 2017
- White T (2015) Hadoop: the definitive guide, 4th edn. O’Reilly Media, Hadoop Google Scholar