[Hadoop article 02] Hadoop completely distributed environment to build

Posted Jun 28, 20203 min read

Better than others, not noble, the real noble should be better than the past

Hadoop completely distributed environment

Write distribution file script

Application scenarios are as follows:For example, there are three hosts master1, slave1, slave2

If the resume is completely distributed in the cluster, you need to copy the file from master1 to the slave

Then you can use the rsync command to distribute a single file, or you can use the following script to distribute folders or files

#!/bin/bash

#1 Get the number of input parameters, if there are no parameters, just exit
# $# represents the number of command line parameters
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi

#2 Get the file name
# $1 means to get the first parameter on the command line
p1=$1
fname=`basename $p1`
echo fname=$fname

#3 Get the upper directory to the absolute path
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 Get the current user name
user=`whoami`

#5 rsync command can distribute files to the specified host
# Here slave is the name of the slave
for((host=1; host<3; host++))
do
        echo ------------------- slave$host --------------
        rsync -rvl $pdir/$fname $user@slave$host:$pdir
done

Cluster planning

hadoop102 hadoop103 hadoop104
HDFS NameNodeDataNode DataNode SecondaryNameNode
DataNode
YARN NodeManager ResourceManager
NodeManager
NodeManager

Configure the cluster

Configuration files are under hadoop2.7.2/etc/hadoop

Configure core-site.xml

<!-- Specify the address of NameNode in HDFS -->
<property>
        <name>fs.defaultFS</name>
      <value>hdfs://hadoop102:9000</value>
</property>

<!-- Specify the storage directory of files generated when Hadoop runs -->
<property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

Configure hadoop-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure hdfs-site.xml

<!-- Specify the number of copies -->
<property>
        <name>dfs.replication</name>
        <value>3</value>
</property>

<!-- Specify SecondaryNamenode host configuration -->
<property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>hadoop104:50090</value>
</property>

Configure yarn-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure yarn-site.xml

<!-- How reducer gets data -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>

<!-- Specify the address of YARN's ResourceManager -->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value>
</property>

Configure mapred-env.sh

export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure mapred-site.xml

[Need to copy mapred-site.xml.template and rename it]

<!-- Specify mr to run on yarn -->
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

Distribute the configuration to the cluster

# xsync is the distribution script just written
xsync /opt/module/hadoop-2.7.2/

Cluster format

# This part must be reported without errors, otherwise it will be repeated, it must be executed in the root directory of hadoop
hadoop namenode -format

Cluster startup test

On the host hadooop102 in the sbin directory, run start-dfs.sh to start HDFS
On the host hadooop103 in the sbin directory, run start-yarn.sh to start yarn
Then use jps to view the process

image-20200513145208176

Relevant information

image-20200624111507710

This article is supported GitHub: https://github.com/zhutiansam...

This article supporting public number:FocusBigData

Reply [Big Data Interview][Big Data Interview Experience][Big Data Learning Roadmap]There will be surprises