Hadoop distributed environment

Posted May 26, 20205 min read

Web environment

CentOS network strategy

The virtual machine network is set to NAT mode

ip user/password(hadoop) hostname user/password(root)
192.168.182.15 hadoop/hadoop master root/root123
192.168.182.16 hadoop/hadoop slave1 root/root123
192.168.182.17 hadoop/hadoop slave2 root/root123

Hosting strategy

VMware Network Adapter VMnet8 settings

VMware Network Adapter VMnet8 Settings

Virtual machine virtual network editor settings

It should be noted that the default gateway in the virtual machine is set to the gateway in the virtual network editor, which is the default gateway of the network adapter of the host VMnet8, 192.168.182.2, and the DNS is set to 114.114.114.114

Unicom nodes

Edit/etc/hosts and/etc/hostname after switching root permissions

After the hostname is modified, it needs to restart the network or restart to take effect

SSH password-free login

Generate key pair, write authorization file and assign permissions

Edit/etc/ssh/sshd \ _config

Open the SSH password-free login between the virtual machines, and transfer the id \ _rsa.pub file to the slave1 user folder of slave1 via scp on the master node

On the slave1 node, check whether it is transferred, add the file to the authorization file, and then restart ssh

Check on the master node

When transferring files, if the slave1 node does not have an .ssh folder, create a .ssh folder and grant 700 management permissions, then append the public key to the authorization file and grant 600 permissions

Software Environment

JDK

Download the JDK in tar.gz format

unzip files

Unzip to the current folder

tar -zxvf jdk-8u144-linux-x64.tar.gz

Cut the folder to the/usr directory

Add at the end of/etc/profile

Make the file effective and test

Hadoop

File preparation

Select the hadoop version:
    hadoop-2.7.3

unzip files

tar zxvf hadoop-2.2.0_x64.tar.gz

After decompression

Rename folder

sudo mv hadooop-2.7.3 hadoop

Cut below usr

This place did a small test
sudo mv hadooop-2.7.3 hadoop
Followed by the folder name will directly rename the folder
sudo mv hadooop-2.7.3/usr
If the path is followed, the folder will be cut below the path
sudo mv hadooop-2.7.3/* hadoop
Followed by the folder name plus/*, all files below the folder will be cut to another folder

Create a directory under the user directory
~/tmp
~/dfs/name
~/dfs/data

Pay attention to users and user groups

hadoop configuration

There are a total of 7 files that need to be modified

  1. /usr/hadoop/etc/hadoop/hadoop-env.sh
  2. /usr/hadoop/etc/hadoop/yarn-env.sh
  3. /usr/hadoop/etc/hadoop/slaves
  4. /usr/hadoop/etc/hadoop/core-site.xml
  5. /usr/hadoop/etc/hadoop/hdfs-site.xml
  6. /usr/hadoop/etc/hadoop/mapred-site.xml
  7. /usr/hadoop/etc/hadoop/yarn-site.xml

Do not have spaces in the and nodes in the configuration file

  • hadoop-env.sh

  • /usr/hadoop/etc/hadoop/yarn-env.sh

*/usr/hadoop/etc/hadoop/slaves

  • /usr/hadoop/etc/hadoop/core-site.xml

The previously created folder ~/tmp(/home/hadoop/tmp)

<configuration>
    <property>
        <name> fs.defaultFS </name>
        <value> hdfs://master:8020 </value>
    </property>

    <property>
        <name> io.file.buffer.size </name>
        <value> 131072 </value>
    </property>

    <property>
        <name> hadoop.tmp.dir </name>
        <value> file:/home/hadoop/tmp </value>
        <description> Abase for other temporary directories. </description>
    </property>

    <property>
        <name> hadoop.proxyuser.hadoop.hosts </name>
        <value> * </value>
        <description> hadoop users can proxy users on any machine </description>
    </property>

    <property>
        <name> hadoop.proxyuser.hadoop.groups </name>
        <value> * </value>
        <description> hadoop user agent users under any group </description>
    </property>
</configuration>
  • /usr/hadoop/etc/hadoop/hdfs-site.xml

    dfs.namenode.secondary.http-address master:9001
      <property>
          <name> dfs.namenode.name.dir </name>
          <value> file:/home/hadoop/dfs/name </value>
      </property>
    
      <property>
          <name> dfs.datanode.data.dir </name>
          <value> file:/home/hadoop/dfs/data </  value>
      </property>
    
      <property>
          <name> dfs.replication </name>
          <value> 3 </value>
      </property>
    
      <property>
          <name> dfs.webhdfs.enabled </name>
          <value> true </value>
      </property>

The folder ~/dfs(/home/hadoop/dfs) created previously is used
~/dfs/name(/home/hadoop/dfs/name)
~/dfs/data(/home/hadoop/dfs/data)

  • /usr/hadoop/etc/hadoop/mapred-site.xml

Copy and rename the template file first

cp mapred-site.xml.template mapred-site.xml

<configuration>
    <property>
        <name> mapreduce.framework.name </name>
        <value> yarn </value>
    </property>

    <property>
        <name> mapreduce.jobhistory.address </name>
        <value> master:10020 </value>
    </property>

    <property>
        <name> mapreduce.jobhistory.webapp.address </name>
        <value> master:19888 </value>
    </property>
</configuration>
  • /usr/hadoop/etc/hadoop/yarn-site.xml

    yarn.nodemanager.aux-services mapreduce_shuffle
      <property>
          <name> yarn.nodemanager.aux-services.mapreduce.shuffle.class </name>
          <value> org.apache.hadoop.mapred.ShuffleHandler </value>
      </property>
    
      <property>
          <name> yarn.resourcemanager.address </name>
          <value> master:8032 </value>
      </property>
    
      <property>
          <name> yarn.resourcemanager.scheduler.address </name>
          <value> master:8030 </value>
      </property>
    
      <property>
          <name> yarn.resourcemanager.resource-tracker.address </name>
          <value> master:8031   </value>
      </property>
    
      <property>
          <name> yarn.resourcemanager.admin.address </name>
          <value> master:8033 </value>
      </property>
    
      <property>
          <name> yarn.resourcemanager.webapp.address </name>
          <value> master:8088 </value>
      </property>

Configure environment variables

Test

Start HDFS

On the master node

  1. Format hdfs hdfs namenode -format or hdfs namenode format
  2. Start hdfs start-dfs.sh

Use the jps command on the slave1 and slave2 nodes

The NameNode and SecondaryNameNode on the master node are started; the DataNode on the slave node is started, which means that HDFS starts successfully

Start yarn

Above the master node

Above the slave node

The ResourceMananger on the master node has started; the NodeMananger on the slave node has started, which means that yarn started successfully

Visit WebUI

Visit http //:master:8088

Successfully built