Setting up a Hadoop environment is a crucial step to start working with Hadoop. This module will guide you through the process of setting up Hadoop on your local machine. We will cover the following steps:

  1. Prerequisites
  2. Downloading Hadoop
  3. Configuring Hadoop
  4. Starting Hadoop Services
  5. Verifying the Installation

  1. Prerequisites

Before setting up Hadoop, ensure that your system meets the following requirements:

  • Java Development Kit (JDK): Hadoop requires Java to run. Ensure you have JDK installed on your system.
  • SSH: Hadoop uses SSH for communication between nodes. Ensure SSH is installed and configured.

Checking Java Installation

To check if Java is installed, open a terminal and run:

java -version

If Java is not installed, download and install it from the official Oracle website or use your package manager (e.g., apt-get for Ubuntu).

Installing SSH

On Ubuntu, you can install SSH using:

sudo apt-get install openssh-server

  1. Downloading Hadoop

Download the latest stable version of Hadoop from the Apache Hadoop releases page.

For example, to download Hadoop 3.3.1, use the following command:

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz

Extract the downloaded tarball:

tar -xzvf hadoop-3.3.1.tar.gz

Move the extracted directory to /usr/local:

sudo mv hadoop-3.3.1 /usr/local/hadoop

  1. Configuring Hadoop

Setting Environment Variables

Add the following lines to your .bashrc or .zshrc file to set Hadoop environment variables:

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Apply the changes:

source ~/.bashrc

Configuring Hadoop Files

Edit the following configuration files located in the $HADOOP_HOME/etc/hadoop directory:

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///usr/local/hadoop/hadoop_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///usr/local/hadoop/hadoop_data/hdfs/datanode</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>

  1. Starting Hadoop Services

Formatting the Namenode

Before starting Hadoop services, format the namenode:

hdfs namenode -format

Starting HDFS

Start the HDFS services:

start-dfs.sh

Starting YARN

Start the YARN services:

start-yarn.sh

  1. Verifying the Installation

To verify that Hadoop is running correctly, you can use the following commands:

Checking HDFS

List the contents of the root directory in HDFS:

hdfs dfs -ls /

Accessing Web Interfaces

  • HDFS Web UI: Open http://localhost:9870 in your web browser.
  • YARN ResourceManager Web UI: Open http://localhost:8088 in your web browser.

Conclusion

In this module, we covered the steps to set up a Hadoop environment on your local machine. You learned how to install prerequisites, download and configure Hadoop, start Hadoop services, and verify the installation. With your Hadoop environment set up, you are now ready to explore Hadoop's core components and functionalities in the subsequent modules.

© Copyright 2024. All rights reserved