介绍伪分布式Hadoop集群安装配置,也就是单节点的Hadoop集群,主要用于学习和开发测试等场景。
安装JDK
首先需要安装JDK
下载安装包
目前hadoop3.2最新发布的版本是hadoop-3.2.2,清华大学的镜像下载地址:
https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
下载后校验一下安装包的完整性,可以与官网sha512值比对。
[root@centos1 ~]# sha512sum hadoop-3.2.2.tar.gz
054753301927d31a69b80be3e754fd330312f0b1047bcfa4ab978cdce18319ed912983e6022744d8f0c8765b98c87256eb1c3017979db1341d583d2cee22d029 hadoop-3.2.2.tar.gz
配置ssh免密
测试ssh到localhost是否免密了
ssh localhost
如果没有,则执行以下操作
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
安装配置Hadoop
配置HDFS
tar -zxf hadoop-3.2.2.tar.gz -C /opt
chown -R root:root /opt/hadoop-3.2.2
ln -s /opt/hadoop-3.2.2 /opt/hadoop
cd /opt/hadoop
vi etc/hadoop/hadoop-env.sh
# 在hadoop-env.sh文件中添加以下环境变量
export JAVA_HOME=/usr/java/latest
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
vi etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
格式化文件系统
bin/hdfs namenode -format
启动HDFS
sbin/start-dfs.sh
可以看到启动了一个NameNode, 一个SecondaryNameNode, 一个DataNode
如果启动报以下错误,请检查hadoop-env.sh文件中环境变量是否已配置。
[root@centos1 hadoop]# sbin/start-dfs.sh
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [centos1]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
浏览器打开 http://localhost:9870,这是HDFS NameNode的Web Console。
测试本地运行MapReduce jobs,即不借助于YARN的情况下。
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/root
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -cat output/*
配置YARN
通过YARN来运行MapReduce jobs。
vi etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
vi etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
启动YARN
sbin/start-yarn.sh
浏览器打开 http://localhost:8088,这是ResourceManager的Web Console。
测试借助于YARN运行MapReduce jobs的情况下。
bin/hdfs dfs rm -r /user/root/output
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -cat output/*
在ResourceManager的Web Console里也能看到运行的YARN任务。
通过命令行查看HDFS的结果输出目录,结果和刚才本地运行MapReduce jobs相同。
配置环境变量
如果能配置一下系统环境变量的话,在使用HDFS命令行时会更方便。
vi /etc/profile
# Hadoop
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
source /etc/profile
此时不需要进入到Hadoop的安装目录,就可以运行HDFS的命令。
hdfs dfs -ls /user
欢迎关注我的公众号“九万里大数据”,原创技术文章第一时间推送。
欢迎访问原创技术博客网站 jwldata.com,排版更清晰,阅读更爽快。