Stay hungry Stay foolish

Zookeeper启动报错及解决方法

Posted on By blue

目录



系统及软件环境

zookeeper-3.4.6

jdk1.7

Red Hat Enterprise Linux Server release 6.4


集群环境

192.168.0.200 master200

192.168.0.201 slave201

192.168.0.202 slave202

192.168.0.203 slave203


安装好后zookeeper执行启动命令:

[root@master200 ~]#zkServer.sh start
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

稍等几秒,查看zookeeper服务状态:

[root@master200 ~]# zkServer.sh status
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.

服务没有正常启动,通过查看进程,发现进程已经启动,

[root@master200 ~]# ps -ef | grep zookeeper
root  22157  1  0 02:09 pts/000:00:00 /root/jdk1.7.0_79/bin/java -Dzookeeper.log.dir=. -Dzookeeper.root.logger=INFO,CONSOLE -cp /root/zookeeper-3.4.6/bin/../build/classes:/root/zookeeper-3.4.6/bin/../build/lib/*.jar:/root/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/root/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/root/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/root/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/root/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/root/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/root/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/root/zookeeper-3.4.6/bin/../conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /root/zookeeper-3.4.6/bin/../conf/zoo.cfg

可见,程序已经启动,但在运行过程中出现故障,首先查看zookeeper日志:

[root@slave202 ~]# tac zookeeper.out | more
2015-07-19 02:19:02,066 [myid:1] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x14ea5988bdb0000, likely client has closed socket
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:745

通过日志可以分析出,zookeeper配置的节点[myid:1]无法连接,

查看zookeeper配置文件,发现[myid:1]为自己本身的服务器

[root@master200 ~]# cat zookeeper-3.4.6/conf/zoo.cfg 
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/root/zookeeper-3.4.6/zkdata
clientPort=2181
server.1=master200:2888:3888  
server.2=slave201:2888:3888
server.3=slave202:2888:3888

配置文件并无问题,查看相应端口使用情况:

[root@master200 ~]# netstat -ano | grep 2888
tcp        0      0 ::ffff:192.168.0.200:54947  ::ffff:192.168.0.201:2888   ESTABLISHED off (0.00/0/0)

可见,子节点并没有通过2888这个端口连接到主机上,

之前出现这样的问题,关闭防火墙,再次启动,程序就正常了,

但这次经过检查,发现防火墙已经关闭,也并无其它程序占用端口,感觉很奇怪。

检查其它几个节点,发现其中有一个节点是正常启动的,

[root@slave201 bin]# zkServer.sh status
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower

依次检查该主机的zookeeper程序,系统环境,网络,发现和其它主机并无差异,

接着检查zookeeper配置文件,hosts文件,发现在hosts文件配置中,

[root@slave202 ~]# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.0.200 master200
192.168.0.201 slave201
192.168.0.202 slave202  
192.168.0.203 slave203

多了一行,127.0.0.1 localhost.localdomain localhost

服务器无法连接到自身所在的服务器,那是不是因为这个没有这个配置导致的呢?

依次将其它几个节点的hosts文件,添加上这一行,再次启动,发现启动成功:

[root@master200 ~]# zkServer.sh status
JMX enabled by default
Using config: /root/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: follower

由此可见,当程序启动时,连接自身所在服务器,是通过127.0.0.1或localhost来连接的

在查看官方文档时,发现并没有指出hosts文件中,必须配置上

127.0.0.1 localhost

但顺手看了下hbase的文档,里面是明确要求要这样配置的

最后,来分析下${zookeeper_home}/conf/zoo.cfg配置文件中,

server.1=master200:2888:3888 
... ...

两个端口的作用,

[root@slave201 bin]# netstat -ano | grep 2888
tcp        0      0 ::ffff:192.168.0.201:2888   :::*                        LISTEN      off (0.00/0/0)
tcp        0      0 ::ffff:192.168.0.201:2888   ::ffff:192.168.0.202:38243  ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:192.168.0.201:2888   ::ffff:192.168.0.200:54947  ESTABLISHED off (0.00/0/0)

[root@slave201 bin]# netstat -ano | grep 3888
tcp        0      0 ::ffff:192.168.0.201:3888   :::*                        LISTEN      off (0.00/0/0)
tcp        0      0 ::ffff:192.168.0.201:60662  ::ffff:192.168.0.200:3888   ESTABLISHED off (0.00/0/0)
tcp        0      0 ::ffff:192.168.0.201:3888   ::ffff:192.168.0.202:52260  ESTABLISHED off (0.00/0/0)

通过查看2888,和3888这两个端口使用情况,得出:

前一个端口2888,为程序所在服务器,zookeeper开放给其它节点的端口,其它子节点可以通过这个端口,与节点所在的服务器,建立连接。

那么另外一个端口是什么意思呢?

Finally, note the two port numbers after each server name: “ 2888” and “3888”. Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

通过查看文档,得知,另外一个端口,是leader节点挂掉时,选举新的leader节点而进行通讯的端口