自己的虚拟机环境中部署了一套自己测试用的hadoop环境,平时为了方便启动都是直接调用start-all.sh脚本直接启动,那么这个脚本中的执行流程是如何的,脚本是如何配置参数然后启动集群中各个服务进程的呢?之前只是知道用start-all.sh偷懒,有时间刚好看了一遍这个脚本的整体流程,以start-all.sh作为出发,了解整个脚本启动的流程对于理解集群配置还是有一定帮助的,起码可以了解bin目录下对应的各种脚本大致是做什么用的,废话不多说,直接开始分析流程,还是和之前的模式一样,直接在脚本中添加了注释,通过注释的方式进行说明。
start-all.sh脚本内容如下:
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd` #bin中存储当前脚本所在路径
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
#获取HADOOP_LIBEXEC_DIR路径,如果系统环境变量已经配置则用环境变量中路径,否则用获取到的默认的路径
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh #
# start hdfs daemons if hdfs is present
if [ -f "${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh ]; then
"${HADOOP_HDFS_HOME}"/sbin/start-dfs.sh --config $HADOOP_CONF_DIR
fi
# start yarn daemons if yarn is present
if [ -f "${HADOOP_YARN_HOME}"/sbin/start-yarn.sh ]; then
"${HADOOP_YARN_HOME}"/sbin/start-yarn.sh --config $HADOOP_CONF_DIR
fi
这个脚本中主要先对寻找到hadoop-config.sh
的位置,然后执行这个脚本,这个脚本见名知意很显然是进行hadoop配置设置的,脚本内容和注释如下:
this="${BASH_SOURCE-$0}"
common_bin=$(cd -P -- "$(dirname -- "$this")" && pwd -P)
script="$(basename -- "$this")"
this="$common_bin/$script"
[ -f "$common_bin/hadoop-layout.sh" ] && . "$common_bin/hadoop-layout.sh"
#以下这些路径如果没有设置,后面的相对路径都是在hadoop安装路径下
HADOOP_COMMON_DIR=${HADOOP_COMMON_DIR:-"share/hadoop/common"} #指定hadoop-common-2.6.0.jar这类jar包的路径
HADOOP_COMMON_LIB_JARS_DIR=${HADOOP_COMMON_LIB_JARS_DIR:-"share/hadoop/common/lib"} #上面common路径更深一层的lib路径下的各种依赖jar包路径
HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_COMMON_LIB_NATIVE_DIR:-"lib/native"} #依赖的native动态链接库路径
HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"} #hdfs的依赖
HDFS_LIB_JARS_DIR=${HDFS_LIB_JARS_DIR:-"share/hadoop/hdfs/lib"} #hdfs的第三方依赖包
YARN_DIR=${YARN_DIR:-"share/hadoop/yarn"} #yarn的依赖
YARN_LIB_JARS_DIR=${YARN_LIB_JARS_DIR:-"share/hadoop/yarn/lib"} #yarn第三方依赖
MAPRED_DIR=${MAPRED_DIR:-"share/hadoop/mapreduce"} #mapreduce依赖
MAPRED_LIB_JARS_DIR=${MAPRED_LIB_JARS_DIR:-"share/hadoop/mapreduce/lib"} #mapreduce第三方依赖
# the root of the Hadoop installation
# See HADOOP-6255 for directory structure layout
HADOOP_DEFAULT_PREFIX=$(cd -P -- "$common_bin"/.. && pwd -P)
HADOOP_PREFIX=${HADOOP_PREFIX:-$HADOOP_DEFAULT_PREFIX}
export HADOOP_PREFIX
#这一段获取hadoop-evn.sh这类配置文件所在路径,如果入参中有通过--config指定路径则用指定的路径,否则就用默认的hadoop安装路径下etc/hadoop/的路径
#check to see if the conf dir is given as an optional argument
if [ $# -gt 1 ]
then
if [ "--config" = "$1" ]
then
shift
confdir=$1
if [ ! -d "$confdir" ]; then
echo "Error: Cannot find configuration directory: $confdir"
exit 1
fi
shift
HADOOP_CONF_DIR=$confdir
fi
fi
# 由于版本不同对默认配置文件的路径有变化,兼容不同版本而获取到默认配置文件路径
# Allow alternate conf dir location.
if [ -e "${HADOOP_PREFIX}/conf/hadoop-env.sh" ]; then
DEFAULT_CONF_DIR="conf"
else
DEFAULT_CONF_DIR="etc/hadoop"
fi
export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$HADOOP_PREFIX/$DEFAULT_CONF_DIR}"
# 用户可以在环境变量中配置HADOOP_SLAVES或是HADOOP_SLAVE_NAMES,但是不能两个都同时配置有值
# User can specify hostnames or a file where the hostnames are (not both)
if [[ ( "$HADOOP_SLAVES" != '' ) && ( "$HADOOP_SLAVE_NAMES" != '' ) ]] ; then
echo \
"Error: Please specify one variable HADOOP_SLAVES or " \
"HADOOP_SLAVE_NAME and not both."
exit 1
fi
# Process command line options that specify hosts or file with host
# 读取命令行中用户配置的--hosts或是--hostnames,表示HADOOP_SLAVES、HADOOP_SLAVE_NAMES
if [ $# -gt 1 ]
then
if [ "--hosts" = "$1" ]
then
shift
export HADOOP_SLAVES="${HADOOP_CONF_DIR}/$1"
shift
elif [ "--hostnames" = "$1" ]
then
shift
export HADOOP_SLAVE_NAMES=$1
shift
fi
fi
#这里再次校验HADOOP_SLAVES和HADOOP_SLAVE_NAMES不能同时存在,但是这里再报错就知道错误原因是由于启动命令行传参有误导致
# User can specify hostnames or a file where the hostnames are (not both)
# (same check as above but now we know it's command line options that cause
# the problem)
if [[ ( "$HADOOP_SLAVES" != '' ) && ( "$HADOOP_SLAVE_NAMES" != '' ) ]] ; then
echo \
"Error: Please specify one of --hosts or --hostnames options and not both."
exit 1
fi
#hadoop-env.sh主要是配置各种环境变量(hadoop和jvm的)
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
# check if net.ipv6.bindv6only is set to 1
bindv6only=$(/sbin/sysctl -n net.ipv6.bindv6only 2> /dev/null)
if [ -n "$bindv6only" ] && [ "$bindv6only" -eq "1" ] && [ "$HADOOP_ALLOW_IPV6" != "yes" ]
then
echo "Error: \"net.ipv6.bindv6only\" is set to 1 - Java networking could be broken"
echo "For more info: http://wiki.apache.org/hadoop/HadoopIPv6"
exit 1
fi
# Newer versions of glibc use an arena memory allocator that causes virtual
# memory usage to explode. This interacts badly with the many threads that
# we use in Hadoop. Tune the variable down to prevent vmem explosion.
export MALLOC_ARENA_MAX=${MALLOC_ARENA_MAX:-4}
# Attempt to set JAVA_HOME if it is not set
if [[ -z $JAVA_HOME ]]; then
# On OSX use java_home (or /Library for older versions)
if [ "Darwin" == "$(uname -s)" ]; then
if [ -x /usr/libexec/java_home ]; then
export JAVA_HOME=($(/usr/libexec/java_home))
else
export JAVA_HOME=(/Library/Java/Home)
fi
fi
# Bail if we did not detect it
if [[ -z $JAVA_HOME ]]; then
echo "Error: JAVA_HOME is not set and could not be found." 1>&2
exit 1
fi
fi
JAVA=$JAVA_HOME/bin/java
# some Java parameters
JAVA_HEAP_MAX=-Xmx1000m
# check envvars which might override default args
if [ "$HADOOP_HEAPSIZE" != "" ]; then
#echo "run with heapsize $HADOOP_HEAPSIZE"
JAVA_HEAP_MAX="-Xmx""$HADOOP_HEAPSIZE""m"
#echo $JAVA_HEAP_MAX
fi
...(略)
由于脚本较长,配置的内容过多,只针对性的选择了部分内容贴出来加以说明,显然通过hadoop-config.sh
是类似于配置中心,通过其对各种参数配置完毕后,start-all.sh
会调用start-dfs.sh
,在start-dfs.sh
中会调用hadoop-daemon.sh
这个我们十分熟悉的脚本,一般我们如果单独启动某些应用例如namenode、datanode就是通过这个脚本来启动的,这里通过脚本的形式自动调用hadoop-daemon.sh
去启动需要启动的进程服务,start-dfs.sh
脚本内容如下:
#---------------------------------------------------------
# namenodes
NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
echo "Starting namenodes on [$NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$NAMENODES" \
--script "$bin/hdfs" start namenode $nameStartOpt
#---------------------------------------------------------
# datanodes (using default slaves file)
if [ -n "$HADOOP_SECURE_DN_USER" ]; then
echo \
"Attempting to start secure cluster, skipping datanodes. " \
"Run start-secure-dns.sh as root to complete startup."
else
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--script "$bin/hdfs" start datanode $dataStartOpt
fi
#---------------------------------------------------------
# secondary namenodes (if any)
SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)
if [ -n "$SECONDARY_NAMENODES" ]; then
echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"
"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \
--config "$HADOOP_CONF_DIR" \
--hostnames "$SECONDARY_NAMENODES" \
--script "$bin/hdfs" start secondarynamenode
fi
这里也是列出了核心脚本,显然依据传入参数不同,会调用hadoop-daemon.sh
针对性的启动不同的进程服务,hadoop-daemon.sh
脚本如下:
usage="Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."
# if no args specified, show usage
if [ $# -le 1 ]; then
echo $usage
exit 1
fi
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
在hadoop-daemon.sh最后通过exec调用slaves.sh脚本,slaves.sh脚本的作用就是在各个slave节点上启动对应服务,这里就会读取slave配置文件中配置的数据,slaves.sh脚本如下:
usage="Usage: slaves.sh [--config confdir] command..."
# if no args specified, show usage
if [ $# -le 0 ]; then
echo $usage
exit 1
fi
bin=`dirname "${BASH_SOURCE-$0}"`
bin=`cd "$bin"; pwd`
DEFAULT_LIBEXEC_DIR="$bin"/../libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. $HADOOP_LIBEXEC_DIR/hadoop-config.sh
if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then
. "${HADOOP_CONF_DIR}/hadoop-env.sh"
fi
# Where to start the script, see hadoop-config.sh
# (it set up the variables based on command line options)
if [ "$HADOOP_SLAVE_NAMES" != '' ] ; then
SLAVE_NAMES=$HADOOP_SLAVE_NAMES
else
SLAVE_FILE=${HADOOP_SLAVES:-${HADOOP_CONF_DIR}/slaves}
SLAVE_NAMES=$(cat "$SLAVE_FILE" | sed 's/#.*$//;/^$/d')
fi
# start the daemons
for slave in $SLAVE_NAMES ; do
ssh $HADOOP_SSH_OPTS $slave $"${@// /\\ }" \
2>&1 | sed "s/^/$slave: /" &
if [ "$HADOOP_SLAVE_SLEEP" != "" ]; then
sleep $HADOOP_SLAVE_SLEEP
fi
done
wait
slaves.sh脚本主要是依据hadoop-daemon脚本的入参,去子节点上执行对应脚本,启动对应服务进程。如上在start-dfs.sh调用hadoop-daemon.sh的时候就传入了启动进程的初始脚本,都是./bin/hdfs start namenode这种,很显然最终的启动行应该是在hdfs中,脚本比较长,主要就是依据入参需要启动的服务不同(这里面涉及的服务特别多),组装好对应参数,然后调用:
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
完成服务的启动,这个是在hdfs脚本最后一行。以上就是脚本整体流程,可以看到主要需要了解的个人觉得还是hadoop-config.sh
这个脚本中的各个配置项,因为如果要深入掌握集群情况,有时候这些参数还是需要知道甚至调整的。