Setting up a Hadoop environment is a crucial step to start working with Hadoop. This module will guide you through the process of setting up Hadoop on your local machine. We will cover the following steps:
- Prerequisites
- Downloading Hadoop
- Configuring Hadoop
- Starting Hadoop Services
- Verifying the Installation
- Prerequisites
Before setting up Hadoop, ensure that your system meets the following requirements:
- Java Development Kit (JDK): Hadoop requires Java to run. Ensure you have JDK installed on your system.
- SSH: Hadoop uses SSH for communication between nodes. Ensure SSH is installed and configured.
Checking Java Installation
To check if Java is installed, open a terminal and run:
If Java is not installed, download and install it from the official Oracle website or use your package manager (e.g., apt-get
for Ubuntu).
Installing SSH
On Ubuntu, you can install SSH using:
- Downloading Hadoop
Download the latest stable version of Hadoop from the Apache Hadoop releases page.
For example, to download Hadoop 3.3.1, use the following command:
Extract the downloaded tarball:
Move the extracted directory to /usr/local
:
- Configuring Hadoop
Setting Environment Variables
Add the following lines to your .bashrc
or .zshrc
file to set Hadoop environment variables:
export HADOOP_HOME=/usr/local/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
Apply the changes:
Configuring Hadoop Files
Edit the following configuration files located in the $HADOOP_HOME/etc/hadoop
directory:
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
- Starting Hadoop Services
Formatting the Namenode
Before starting Hadoop services, format the namenode:
Starting HDFS
Start the HDFS services:
Starting YARN
Start the YARN services:
- Verifying the Installation
To verify that Hadoop is running correctly, you can use the following commands:
Checking HDFS
List the contents of the root directory in HDFS:
Accessing Web Interfaces
- HDFS Web UI: Open
http://localhost:9870
in your web browser. - YARN ResourceManager Web UI: Open
http://localhost:8088
in your web browser.
Conclusion
In this module, we covered the steps to set up a Hadoop environment on your local machine. You learned how to install prerequisites, download and configure Hadoop, start Hadoop services, and verify the installation. With your Hadoop environment set up, you are now ready to explore Hadoop's core components and functionalities in the subsequent modules.
Hadoop Course
Module 1: Introduction to Hadoop
- What is Hadoop?
- Hadoop Ecosystem Overview
- Hadoop vs Traditional Databases
- Setting Up Hadoop Environment
Module 2: Hadoop Architecture
- Hadoop Core Components
- HDFS (Hadoop Distributed File System)
- MapReduce Framework
- YARN (Yet Another Resource Negotiator)
Module 3: HDFS (Hadoop Distributed File System)
Module 4: MapReduce Programming
- Introduction to MapReduce
- MapReduce Job Workflow
- Writing a MapReduce Program
- MapReduce Optimization Techniques
Module 5: Hadoop Ecosystem Tools
Module 6: Advanced Hadoop Concepts
Module 7: Real-World Applications and Case Studies
- Hadoop in Data Warehousing
- Hadoop in Machine Learning
- Hadoop in Real-Time Data Processing
- Case Studies of Hadoop Implementations