cloudstack下libvirtd服务无响应问题
作者:gordonchang 发布时间:2023-07-26 05:38:44
在cloudstack4.5.2版本下,偶尔出现libvirtd服务无响应的情况,导致virsh命令无法使用,同时伴随cloudstack master丢失该slave主机连接的情况。最初怀疑是libvirtd服务或版本的问题,经过分析和排查最终确定是cloudstack-agent的问题。但是在官网上并没有找到类似的bug提交,该问题可能还存在于更高的版本,需要时间进一步从根本上分析。下面是该问题的处理过程,在此记录下,关注和使用cloudstack的朋友可以参考。
众所周知,cloudstack的社区热度远不如openstack,为什么还要选择clcoudstack?这个问题以后有机会再和大家聊。言归正传。
环境交代
宿主机操作系统:centos6.5x64(2.6.32-431.el6.x86_64)
cloudstack版本:4.5.2
libvirt版本:libvirt-0.10.2-54.el6_7.2.x86_64
问题描述
通过cloudstackapi listHosts报警信息显示:node5.cloud.rtmap:192.168.14.20 state is Down at 2016-05-13T07:19:04+0800
#有关cloudstackapi的使用方法在其它文章中总结,不在此处说明。
登陆问题宿主服务器检查:
[root@node5 log]#virsh list --all
没有响应ctrl^c退出
这时的vm可以正常工作,但处于失控状态
尝试重启启动libvirtd服务:
[root@node5 log]# service libvirtd stop
正在关闭 libvirtd 守护进程: [失败]#无法关闭libvirtd服务
尝试重启启动cloudstack-agent服务:
[root@node5 libvirt]# service cloudstack-agent restart
Stopping Cloud Agent:
Starting Cloud Agent:
libvirtd故障依旧
简单维护
[root@node5 ping]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
[root@node5 log]# libvirtd -d
可以执行成功,这时执行virsh list --all 可以查看和操作vm
[root@node5 log]#virsh list --all
Id 名称 状态
----------------------------------------------------
2 i-4-185-VM running
虽然vm运行正常,现在也可以通过命令正常管理了。但是对于cloudstack平台而言,宿主机处于down状态,vm处于失控状态。
临时解决办法是在其它大的升级和维护过程中重启服务器解决,根本解决还要具体问题具体分析。
分析与排查
检查进程
[root@node5 log]# ps ax |grep libvirtd
6485 ? R 863:37 libvirtd --daemon -l#该服务始终处于run状态
[root@node5 log]# top -p 6485
top -p 6485
top - 09:19:41 up 12 days, 22:27, 1 user, load average: 3.05, 5.07, 6.64
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.8%us, 1.4%sy, 0.0%ni, 93.1%id, 0.6%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 264420148k total, 182040780k used, 82379368k free, 834232k buffers
Swap: 8388600k total, 92k used, 8388508k free, 100453708k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6485 root 20 0 984m 12m 4440 R 100.2 0.0 844:22.68 libvirtd #cpu占用100%,无法释放,影响系统稳定性
杀进程
[root@node5 log]# kill -9 6485
[root@node5 log]# kill -9 6485
[root@master log]# ps ax |grep libvirtd#检查进程依然存在
6485 ? R 863:37 libvirtd --daemon -l
[root@node5 ~]# libvirtd -d -l --config /etc/libvirt/libvirtd.conf
libvirtd:错误:Unable to initialize network sockets。查看 /var/log/messages 或者运行不带 --daemon 的命令查看更多信息。
[root@node5 ~]# netstat -antp |grep 16509
tcp 0 0 0.0.0.0:16509 0.0.0.0:* LISTEN 3658/libvirtd
tcp 1 0 192.168.14.25:16509 192.168.14.22:8717 CLOSE_WAIT -
tcp 1 0 192.168.14.25:16509 192.168.14.20:5152 CLOSE_WAIT -
tcp 1 0 192.168.14.25:16509 192.168.14.10:39359 CLOSE_WAIT -
tcp 0 0 :::16509 :::* LISTEN 3658/libvirtd
tcp 39 0 ::1:16509 ::1:19715 CLOSE_WAIT -
经过上述操作,初步判断libvirtd陷入了hang死状态。
追踪进程
[root@node5 log]#strace -f libvirtd
[pid 107570] close(23058) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23059) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23060) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23061) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23062) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23063) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23064) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23065) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23066) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23067) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23068) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23069) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23070) = -1 EBADF (Bad file descriptor)
[pid 107570] close(23071) = -1 EBADF (Bad file descriptor)
^C[pid 107570] close(23072 <unfinished ...>
Process 107559 detached
Process 107560 detached
Process 107561 detached
Process 107562 detached
Process 107563 detached
Process 107564 detached
Process 107565 detached
Process 107566 detached
Process 107567 detached
Process 107568 detached
Process 107569 detached
Process 107570 detached
父进程6485在不断的产生和关闭子进程,并返回错误信息。Bad file descriptor的原因(如何触发的,谁触发的)? 循环为何无法退出?问题如何再现?
获得更多的线索
官方文档(libvirtd各种故障诊断记录和解决办法非常详尽)
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Deployment_and_Administration_Guide/sect-Troubleshooting-Common_libvirt_errors_and_troubleshooting.html#sect-libvirtd_failed_to_start
开启系统日志
Change libvirt's logging in /etc/libvirt/libvirtd.conf by enabling the line below. To enable the setting the line, open the /etc/libvirt/libvirtd.conf file in a text editor, remove the hash (or #) symbol from the beginning of the following line, and save the change:
log_outputs="3:syslog:libvirtd"
参照配置,重启服务器等待下次故障观察日志
......
Jun 1 12:42:26 node5 abrtd: New client connected
Jun 1 12:42:26 node5 abrtd: Directory 'pyhook-2016-06-01-12:42:26-70065' creation detected
Jun 1 12:42:26 node5 abrt-server[70066]: Saved Python crash dump of pid 70065 to /var/spool/abrt/pyhook-2016-06-01-12:42:26-70065
Jun 1 12:42:26 node5 abrtd: Package 'cloudstack-common' isn't signed with proper key
Jun 1 12:42:26 node5 abrtd: 'post-create' on '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065' exited with 1
Jun 1 12:42:26 node5 abrtd: Deleting problem directory '/var/spool/abrt/pyhook-2016-06-01-12:42:26-70065'
Jun 1 12:43:26 node5 abrt: detected unhandled Python exception in '/usr/share/cloudstack-common/scripts/vm/network/security_group.py'
......
Jun 6 10:36:21 node5 libvirtd: 102840: warning : qemuDomainObjBeginJobInternal:878 : Cannot start job (modify, none) for domain i-4-30-VM; current job is (modify, none) owned by (102925, 0)
Jun 6 10:36:21 node5 libvirtd: 102840: error : qemuDomainObjBeginJobInternal:883 : Timed out during operation: cannot acquire state change lock
Jun 6 10:39:59 node5 libvirtd: 114071: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:39:59 node5 libvirtd: 114071: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:40:46 node5 libvirtd: 114147: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:40:46 node5 libvirtd: 114147: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:42:15 node5 libvirtd: 114204: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:42:15 node5 libvirtd: 114204: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:47:05 node5 libvirtd: 114375: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:47:05 node5 libvirtd: 114375: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
Jun 6 10:47:23 node5 libvirtd: 114412: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 6 10:47:23 node5 libvirtd: 114412: error : virNetSocketNewListenTCP:312 : Unable to bind to port: 地址已在使用
......
Jun 12 03:08:02 node5 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="3111" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Jun 12 09:20:40 node5 libvirtd: 72575: info : libvirt version: 0.10.2, package: 54.el6_7.2 (CentOS BuildSystem <http://bugs.centos.org>, 2015-11-10-10:25:08, c6b9.bsys.dev.centos.org)
Jun 12 09:20:40 node5 libvirtd: 72575: error : virPidFileAcquirePath:410 : Failed to acquire pid file '/var/run/libvirtd.pid': 资源暂时不可用
并未获得致命错误和更多线索。(该日志配置选项还是很有必要打开的,很多问题都可以通过它来定位)
解决过程
解决思路
尝试和找到终止进程、重启服务的方法
提交bug,等待补丁升级
分析源代码,再现问题,解决问题(投入研发和时间)
由于不能再现问题,还是从简入繁吧。触发这些子进程的元凶是谁?还是cloudstack-agent的嫌疑最大,但之前重启过该服务并没有解决问题,那么agent服务是怎么一回事呢?
看下启动脚本可以基本了解,
[root@node5 libvirt]# cat /etc/rc.d/init.d/cloudstack-agent
#!/bin/bash
# chkconfig: 35 99 10
# description: Cloud Agent
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# WARNING: if this script is changed, then all other initscripts MUST BE changed to match it as well
. /etc/rc.d/init.d/functions
# set environment variables
SHORTNAME=$(basename $0 | sed -e 's/^[SK][0-9][0-9]//')
PIDFILE=/var/run/"$SHORTNAME".pid
LOCKFILE=/var/lock/subsys/"$SHORTNAME"
LOGDIR=/var/log/cloudstack/agent
LOGFILE=${LOGDIR}/agent.log
PROGNAME="Cloud Agent"
CLASS="com.cloud.agent.AgentShell"
JSVC=`which jsvc 2>/dev/null`;
# exit if we don't find jsvc
if [ -z "$JSVC" ]; then
echo no jsvc found in path;
exit 1;
fi
unset OPTIONS
[ -r /etc/sysconfig/"$SHORTNAME" ] && source /etc/sysconfig/"$SHORTNAME"
# The first existing directory is used for JAVA_HOME (if JAVA_HOME is not defined in $DEFAULT)
JDK_DIRS="/usr/lib/jvm/jre /usr/lib/jvm/java-7-openjdk /usr/lib/jvm/java-7-openjdk-i386 /usr/lib/jvm/java-7-openjdk-amd64 /usr/lib/jvm/java-6-openjdk /usr/lib/jvm/java-6-openjdk-i386 /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-sun"
for jdir in $JDK_DIRS; do
if [ -r "$jdir/bin/java" -a -z "${JAVA_HOME}" ]; then
JAVA_HOME="$jdir"
fi
done
export JAVA_HOME
ACP=`ls /usr/share/cloudstack-agent/lib/*.jar | tr '\n' ':' | sed s'/.$//'`
* =`ls /usr/share/cloudstack-agent/plugins/*.jar 2>/dev/null | tr '\n' ':' | sed s'/.$//'`
# We need to append the JSVC daemon JAR to the classpath
# AgentShell implements the JSVC daemon methods
export CLASSPATH="/usr/share/java/commons-daemon.jar:$ACP:$ * :/etc/cloudstack/agent:/usr/share/cloudstack-common/scripts"
start() {
echo -n $"Starting $PROGNAME: "
if hostname --fqdn >/dev/null 2>&1 ; then
$JSVC -Xms256m -Xmx2048m -cp "$CLASSPATH" -pidfile "$PIDFILE" \
-errfile $LOGDIR/cloudstack-agent.err -outfile $LOGDIR/cloudstack-agent.out $CLASS
RETVAL=$?
echo
else
failure
echo
echo The host name does not resolve properly to an IP address. Cannot start "$PROGNAME". > /dev/stderr
RETVAL=9
fi
[ $RETVAL = 0 ] && touch ${LOCKFILE}
return $RETVAL
}
stop() {
echo -n $"Stopping $PROGNAME: "
$JSVC -pidfile "$PIDFILE" -stop $CLASS
RETVAL=$?
echo
[ $RETVAL = 0 ] && rm -f ${LOCKFILE} ${PIDFILE}
}
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status -p ${PIDFILE} $SHORTNAME
RETVAL=$?
;;
restart)
stop
sleep 3
start
;;
condrestart)
if status -p ${PIDFILE} $SHORTNAME >&/dev/null; then
stop
sleep 3
start
fi
;;
*)
echo $"Usage: $SHORTNAME {start|stop|restart|condrestart|status|help}"
RETVAL=3
esac
exit $RETVAL
[root@node5 libvirt]# ps ax |grep jsvc.exec
6655 ? Ss 0:00 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
6657 ? Sl 0:05 jsvc.exec -Xms256m -Xmx2048m -cp /usr/share/java/commons-daemon.jar:/usr/share/cloudstack-agent/lib/activation-1.1.jar:/usr/share/cloudstack-agent/lib/antisamy-1.4.3.jar:/usr/share/cloudstack-agent/lib/aopalliance-1.0.jar:/usr/share/cloudstack-agent/lib/apache-log4j-extras-1.1.jar:/usr/share/cloudstack-agent/lib/aspectjweaver-1.7.0.jar:/usr/share/cloudstack-agent/lib/aws-java-sdk-1.3.22.jar:/usr/share/cloudstack-agent/lib/batik-css-1.7.jar:/usr/share/cloudstack-agent/lib/batik-ext-1.7.jar:/usr/share/cloudstack-agent/lib/batik-util-1.7.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk15-1.46.jar:/usr/share/cloudstack-agent/lib/bcprov-jdk16-1.46.jar:/usr/share/cloudstack-agent/lib/bsh-core-2.0b4.jar:/usr/share/cloudstack-agent/lib/cglib-nodep-2.2.2.jar:/usr/share/cloudstack-agent/lib/cloud-agent-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-core-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-components-api-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-engine-schema-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-cluster-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-config-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-db-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-events-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-ipc-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-jobs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-managed-context-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-rest-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-framework-security-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-hypervisor-kvm-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-plugin-network-ovs-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-server-4.5.2.jar:/usr/share/cloudstack-agent/lib/cloud-utils-4.5.2.jar:/usr/share/cloudstack-agent/lib/commons-beanutils-core-1.7.0.jar:/usr/share/cloudstack-agent/lib/commons-codec-1.6.jar:/usr/share/cloudstack-agent/lib/commons-collections-3.2.1.jar:/usr/share/cloudstack-agent/lib/commons-configuration-1.8.jar:/usr/share/cloudstack-agent/lib/commons-daemon-1.0.10.jar:/usr/share/cloudstack-agent/lib/commons-dbcp-1.4.jar:/usr/share/cloudstack-agent/lib/commons-fileupload-1.2.jar:/usr/share/cloudstack-agent/lib/commons-httpclient-3.1.jar:/usr/share/cloudstack-agent/lib/commons-io-1.4.jar:/usr/share/cloudstack-agent/lib/commons-lang-2.6.jar:/usr/share/cloudstack-agent/lib/commons-logging-1.1.3.jar:/usr/share/cloudstack-agent/lib/commons-net-3.3.jar:/usr/share/cloudstack-agent/lib/commons-pool-1.6.jar:/usr/share/cloudstack-agent/lib/cxf-bundle-jaxrs-2.7.0.jar:/usr/share/cloudstack-agent/lib/dom4j-1.6.1.jar:/usr/share/cloudstack-agent/lib/ehcache-core-2.6.6.jar:/usr/share/cloudstack-agent/lib/ejb-api-3.0.jar:/usr/share/cloudstack-agent/lib/esapi-2.0.1.jar:/usr/share/cloudstack-agent/lib/geronimo-javamail_1.4_spec-1.7.1.jar:/usr/share/cloudstack-agent/lib/geronimo-servlet_3.0_spec-1.0.jar:/usr/share/cloudstack-agent/lib/gson-1.7.2.jar:/usr/share/cloudstack-agent/lib/guava-14.0-rc1.jar:/usr/share/cloudstack-agent/lib/httpclient-4.3.6.jar:/usr/share/cloudstack-agent/lib/httpcore-4.3.3.jar:/usr/share/cloudstack-agent/lib/jackson-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-core-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-databind-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-jaxrs-json-provider-2.1.1.jar:/usr/share/cloudstack-agent/lib/jackson-mapper-asl-1.8.9.jar:/usr/share/cloudstack-agent/lib/jackson-module-jaxb-annotations-2.1.1.jar:/usr/share/cloudstack-agent/lib/jasypt-1.9.0.jar:/usr/share/cloudstack-agent/lib/java-ipv6-0.10.jar:/usr/share/cloudstack-agent/lib/javassist-3.12.1.GA.jar:/usr/share/cloudstack-agent/lib/javassist-3.18.1-GA.jar:/usr/share/cloudstack-agent/lib/javax.inject-1.jar:/usr/share/cloudstack-agent/lib/javax.persistence-2.0.0.jar:/usr/share/cloudstack-agent/lib/javax.ws.rs-api-2.0-m10.jar
重启服务
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 6657) 正在运行...
[root@node5 bin]# service cloudstack-agent stop
Stopping Cloud Agent:
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 6657) 正在运行..
ps ax |grep jsvc.exec 也验证了进程依然存在
眼前一亮的同时,也发现了之前使用restart带来的问题,stop不成功的问题被掩盖了~~~有没有懊恼? 不过来不及反思,接下来的问题还远不是这么简单......
[root@node5 bin]# kill -9 6655 6657
[root@node5 bin]# kill -9 6655 6657
-bash: kill: (6655) - 没有那个进程
-bash: kill: (6657) - 没有那个进程
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但 pid 文件仍存
[root@node5 bin]# rm /var/run/cloudstack-agent.pid
rm:是否删除普通文件 "/var/run/cloudstack-agent.pid"?y
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent 已死,但是 subsys 被锁
[root@node5 bin]# service cloudstack-agent start
[root@node5 bin]# service cloudstack-agent status
cloudstack-agent (pid 109382) 正在运行...
[root@node5 bin]# netstat -antp |grep 8250
tcp 0 0 192.168.14.20:22220 192.168.14.10:8250 ESTABLISHED 109382/jsvc.exec
处理后状态恢复正常,但是libvirtd仍然无法杀掉, 很快netstat -antp |grep 8250 状态再次消失,cloudstack master平台监控主机记录由Up状态转为disconnect状态。不过毕竟不是down状态,较之前已经有了进步。
启动一个libvirtd -d看下,
[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
6485 ? R 863:37 libvirtd --daemon -l
130057 ? Sl 0:38 libvirtd -d
28904 pts/0 S+ 0:00 grep libvirtd
然后在cloudstack master平台上手工点击强制重新连接该主机,成功了。主机监控状态由disconnect转为Up,这时再次尝试杀掉6485仍然是不成功的,于是又在cloudstack master管理平台上尝试着点击操作了一下暂停vm命令,vm成功暂停。再返回服务器上观察原来hung死的libvirtd进程已经消失。
[root@node5 bin]# libvirtd -d
[root@node5 bin]# ps ax |grep libvirtd
130057 ? Sl 0:38 libvirtd -d
28904 pts/0 S+ 0:00 grep libvirtd
至此既恢复了平台对该主机的管控,也终止了libvirtd异常进程。问题初步归于cloudstack-agent在处理发送个libvirtd的信号上存在些小问题。以后再单独分析下jsvc进程,再现问题和根本解决。
问题反思
在处理服务异常的问题上,命令行参数不要用restart,用stop和kill来调试。说起来都是泪!
猜你喜欢
- 随着时代的发展,目前的智能手机无论从功能还配置都已经接近甚至超越一台小型的掌上电脑了,而就从目前的发展趋势来看,手机超越上网本是迟早的事,人
- 自从seo大师丘仕达把挖土豆和163邮箱和其他的单页面站成功的优化后,就引发了SEO界的强烈反响,众多的追随者和后来者前仆后继的学习SEO,
- 虚拟机是非常方便的测试软件,我们可以用来做各种测试,不会对硬件造成伤害,为了能够更加方便的演示环境,便于做各种例子;而又能保证主机的快速运行
- 北京时间11月13日消息,据国外媒体报道,Kershaw, Cutter & Ratinoff律师事务所在其网站上称,正在寻找&am
- 前言:由于Linux下很多软件安装必须网络环境下进行,因此,对于如何在VMware下进行上网,我折腾了至少三天,今天上午,终于搜到一遍技术文
- 受ASP * 页的吸引,笔者决定用Dreamweaver做一个ASP格式的学校主页。然而,当笔者将Web服务器中Htm格式的网页换成新做的A
- 自从我们发布了服务器端广告设置这个功能以后,大家修改代码就更加方便了。Google AdSense代码投放的最好办法是直接通过帐户获取代码并
- 9月29日消息,微软官方今天正式宣布,旗下的免费杀毒软件Microsoft Security Essentials(MSE)正式版今日全球同
- Vmware12虚拟机及Oracle安装教程分享给大家,具体内容如下1、安装XP系统步骤一:步骤二: 内存分配大一些,运行会流畅一些(推荐使
- 据OpenSUSE官方发布的消息称,lizardsource.cn成为中国大陆内OpenSuSE最大镜像服务器。目前,国内也有大大小小几个O
- 服务器的稳定性会直接影响网站排名情况,是seo重要的一环,应该得到从事seo人们的重视.笔者以前也了解过服务器稳定性对排名有着较为严重的影响
- 从系统安装到用户安全设置、系统权限设置三方面进行讲解Web服务器防木马及漏洞攻击的权限配置,希望通过这篇文章可以让你的服务器更加安全。一、系
- asp之家注:百度做为最具影响力的中文搜索引擎,做为个人站长都想通过百度来给自己的网站带来流量,特别是新站,因为没人知道你的网站,你也不可能
- 有人问到“国内以产品为王的公司有哪些?”此问题放大来谈,就是说以产品为核心竞争力的互联网有公司有哪些?问题很尖锐,因为目前以产品为核心竞争力
- 博客(Blog)、维基(Wiki)、播客(Podcast)、RSS阅读改变了人们对传统门户网站的依赖,那么从Gmail开始,人们将改变传统电
- Linux下解压查看JAR包问题在生产环境,有时会有需要在线查看jar中内容的需求.此时, unzip/tar命令都失去了用武之地.解决办法
- 默认的方式下linux会把文件访问的时间atime做记录,这在绝大部分的场合都是没有必要的,如果遇到机器IO负载高或是CPU WAIT高的情
- Dedecms 5.5 实现自定义标题处理,在 {dede:function}@me='64edu.com'.@me{/de
- 作为一个网管员,你是否知道在你的主机或服务器上发生的事情——谁来访问过?他们都做过些什么?目的是什么?什么?你不知道!其实Windows 2
- 域名系统(DNS)是一种用于TCP/IP应用程序的分布式数据库,它提供主机名字和IP地址之间的转换信息。通常,网络用户通过UDP协议和DNS