History
Apache Hadoop
Hadoop Ecosystem
Related Projects
Ref: https://tropars.github.io/downloads/lectures/LSDM/LSDM-2-mapreduce-hadoop.pdf
Todo: https://data-flair.training/blogs/hadoop-ecosystem-components/
- Hadoop is a sturdy, scalable computing platform that provides a distributed file system and MapReduce capabilities.
- Wherever you find HBase, you’ll find Hadoop and other infrastructural components that you can use in your own applications, such as Apache Hive, a data warehousing tool, and Apache Pig, a parallel processing tool (and many others).
Hadoop Ecosystem


Yarn
HDFS
Architecture Overview and Execution
Ref: https://cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_mapreduce.pdf