openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #15713
Re: Trouble getting instances back up after hard server reboot
Hi Samuel,
I am interested in some common/best practices of this as well. I'm posting
this to -operators to see if anyone there has input.
While having instances affected by a compute node reboot does not sound
very cloudy, it is unfortunately an issue can happen often.
I have added some notes inline.
On Thu, Aug 9, 2012 at 1:55 PM, Samuel Winchenbach <swinchen@xxxxxxxxx>wrote:
> Hi all,
>
>
> I am having a terrible time getting my instances to work after a hard
> reboot. I am using the most up-to date version of all openstack
> packages provided by Ubuntu. I have included a list of packages, with
> version, at the end of this email.
>
> After a hard reboot "nova list" reports that the instance is active,
> but there are no kvm processes running. grepping the log file for
> errors I find this in nova-compute.log:
>
If the reboot is quick, nova will still report the instances as active. If
the reboot takes 10 minutes or so, nova notices that the instances are down
and marks them in a Shut Down state with a continuously spinning circle.
I've found that in both scenarios, issuing a reboot either via Horizon or
the cli resolves the issue most of the time -- nova will send a reboot
request to KVM which then re-launches the instance.
>
>
> 2012-08-09 14:32:51 INFO nova.rpc.common
> [req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Connected to AMQP
> server on cloudy-priv:5672
> 2012-08-09 14:33:51 ERROR nova.rpc.common
> [req-dd6fcade-73ec-4378-9a6b-7bc709eefcd4 None None] Timed out waiting
> for RPC response: timed out
> 2012-08-09 14:33:51 TRACE nova.rpc.common Traceback (most recent call
> last):
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490,
> in ensure
> 2012-08-09 14:33:51 TRACE nova.rpc.common return method(*args,
> **kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567,
> in _consume
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.connection.drain_events(timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in
> drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.transport.drain_events(self.connection, **kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 238, in drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> connection.drain_events(**kwargs)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 57, in drain_events
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.wait_multi(self.channels.values(), timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 63, in wait_multi
> 2012-08-09 14:33:51 TRACE nova.rpc.common chanmap.keys(),
> allowed_methods, timeout=timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 120, in _wait_multiple
> 2012-08-09 14:33:51 TRACE nova.rpc.common channel, method_sig,
> args, content = read_timeout(timeout)
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line
> 94, in read_timeout
> 2012-08-09 14:33:51 TRACE nova.rpc.common return
> self.method_reader.read_method()
> 2012-08-09 14:33:51 TRACE nova.rpc.common File
> "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py",
> line 221, in read_method
> 2012-08-09 14:33:51 TRACE nova.rpc.common raise m
> 2012-08-09 14:33:51 TRACE nova.rpc.common timeout: timed out
> 2012-08-09 14:33:51 TRACE nova.rpc.common
> 2012-08-09 14:33:51 CRITICAL nova [-] Timeout while waiting on RPC
> response.
>
> restarting nova-compute brings the instance up, so it looks like
> nova-compute is starting before rabbitmq? Is there a clean way
> around this, or should I put "service nova-compute restart" in
> rc.local?
>
>
>
> If I have a volume attached things get much worse. I can still start
> the instance by restarting nova-compute, but the volume does not
> attach.
Yes, dealing with volumes after a reboot plain sucks. Most of the time, I
end up manually setting the volume as detached and available in the volumes
table of the database. Sometimes I have to log into the server that hosts
the volume and cut the iscsi connection.
And if there was IO traffic between the instance and volume at the time of
reboot, you'll most likely need to fsck the volume when it is reattached to
the instance.
> I can not seem to detach the volume in order to attach it
> again. Below is the only error in the log file, and how I mount the
> image that contains the nova-volume logical group. The error occurs
> because it tries to start nova-volume before the loopback device is
> setup.
My only recommendation for this is to not use a loopback device. Use a real
LVM partition instead.
> The command in rc.local restarts the service, making the
> logical group available.
>
> >From nova-volume.log
>
> 2012-08-09 14:32:40 CRITICAL nova [-] volume group nova-volumes doesn't
> exist
>
> >From rc.local
>
> losetup -f /var/lib/nova/nova-volumes.img
> service nova-volume restart
>
> Any idea how I should solve these problems? I could disable upstart
> from bringing the services up automatically and start them in the
> correct order in rc.local, but I don't think this would solve the
> volume attachment issue.
>
> I am so frustrated that I created this script for testing which
> completely resets the nova database table, iptables, and recreates
> everything.
> http://paste2.org/p/2100211
>
> I know it is a dirty dirty hack, but I can't seem to figure out what
> is going on.
>
> Thanks in advance for the help.
> Sam
>
>
> root@cloudy:/var/log/nova# dpkg -l | grep -E
> "(nova|glance|keystone|tgt|rabbit|ntp|mysql|libvirt|kvm)"
> ii glance
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Daemons
> ii glance-api
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - API
> ii glance-client
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Registry
> ii glance-common
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Common
> ii glance-registry
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Registry
> ii keystone
> 2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
> - Daemons
> ii kvm
> 1:84+dfsg-0ubuntu16+1.0+noroms+0ubuntu14.1 dummy transitional package
> from kvm to qemu-kvm
> ii kvm-ipxe 1.0.0+git-3.55f6c88-0ubuntu1
> PXE ROM's for KVM
> ii libdbd-mysql-perl 4.020-1build2
> Perl5 database interface to the MySQL database
> ii libmysqlclient18 5.5.24-0ubuntu0.12.04.1
> MySQL database client library
> ii libsys-virt-perl 0.9.7-2
> Perl module providing an extension for the libvirt library
> ii libvirt-bin 0.9.8-2ubuntu17.3
> programs for the libvirt library
> ii libvirt0 0.9.8-2ubuntu17.3
> library for interfacing with different virtualization systems
> ii mysql-client-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database client binaries
> ii mysql-client-core-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database core client binaries
> ii mysql-common 5.5.24-0ubuntu0.12.04.1
> MySQL database common files, e.g. /etc/mysql/my.cnf
> ii mysql-server 5.5.24-0ubuntu0.12.04.1
> MySQL database server (metapackage depending on the latest
> version)
> ii mysql-server-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database server binaries and system database setup
> ii mysql-server-core-5.5 5.5.24-0ubuntu0.12.04.1
> MySQL database server binaries
> ii nova-api
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - API
> frontend
> ii nova-common
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - common
> files
> ii nova-compute
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
> node
> ii nova-compute-kvm
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - compute
> node (KVM)
> ii nova-network
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - Network
> manager
> ii nova-scheduler
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - virtual
> machine scheduler
> ii nova-volume
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute - storage
> ii ntp 1:4.2.6.p3+dfsg-1ubuntu3.1
> Network Time Protocol daemon and utility programs
> ii ntpdate 1:4.2.6.p3+dfsg-1ubuntu3.1
> client for setting system time from NTP servers
> ii python-glance
> 2012.1+stable~20120608-5462295-0ubuntu2.2 OpenStack Image Registry
> and Delivery Service - Python library
> ii python-keystone
> 2012.1+stable~20120608-aff45d6-0ubuntu1 OpenStack identity service
> - Python library
> ii python-keystoneclient 2012.1-0ubuntu1
> Client libary for Openstack Keystone API
> ii python-libvirt 0.9.8-2ubuntu17.3
> libvirt Python bindings
> ii python-mysqldb 1.2.3-1build1
> Python interface to MySQL
> ii python-nova
> 2012.1+stable~20120612-3ee026e-0ubuntu1.2 OpenStack Compute Python
> libraries
> ii python-novaclient 2012.1-0ubuntu1
> client library for OpenStack Compute API
> ii qemu-kvm 1.0+noroms-0ubuntu14.1
> Full virtualization on i386 and amd64 hardware
> ii rabbitmq-server 2.7.1-0ubuntu4
> An AMQP server written in Erlang
> ii tgt 1:1.0.17-1ubuntu2
> Linux SCSI target user-space tools
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to : openstack@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~openstack
> More help : https://help.launchpad.net/ListHelp
>
--
Joe Topjian
Systems Administrator
Cybera Inc.
www.cybera.ca
Cybera is a not-for-profit organization that works to spur and support
innovation, for the economic benefit of Alberta, through the use
of cyberinfrastructure.
References