← Back to team overview

openstack team mailing list archive

Re: Nova-compute doesn't start on reboot, only manually

 

Excerpts from Alessandro Tagliapietra's message of 2012-05-28 01:41:08 -0700:
> Hello, i've installed openstack following the ubuntu 12.04 deploy guide, only problem is that nova-compute has to be started manually, by default it doesn't start on boot, this is the error log:
> 
> 2012-05-27 23:47:14 INFO nova.rpc.common [req-46624af9-9d2a-4901-b635-66f557d3b54c None None] Connected to AMQP server on 10.8.0.1:5672
> 2012-05-27 23:48:14 ERROR nova.rpc.common [req-46624af9-9d2a-4901-b635-66f557d3b54c None None] Timed out waiting for RPC response: timed out

So this is a reasonable, but still I think, not long enough timeout of
one minute...

> 2012-05-27 23:48:14 TRACE nova.rpc.common Traceback (most recent call last):
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490, in ensure
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return method(*args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567, in _consume
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return self.connection.drain_events(timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return self.transport.drain_events(self.connection, **kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 238, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return connection.drain_events(**kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 57, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return self.wait_multi(self.channels.values(), timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 63, in wait_multi
> 2012-05-27 23:48:14 TRACE nova.rpc.common     chanmap.keys(), allowed_methods, timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 120, in _wait_multiple
> 2012-05-27 23:48:14 TRACE nova.rpc.common     channel, method_sig, args, content = read_timeout(timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 94, in read_timeout
> 2012-05-27 23:48:14 TRACE nova.rpc.common     return self.method_reader.read_method()
> 2012-05-27 23:48:14 TRACE nova.rpc.common   File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method
> 2012-05-27 23:48:14 TRACE nova.rpc.common     raise m
> 2012-05-27 23:48:14 TRACE nova.rpc.common timeout: timed out
> 2012-05-27 23:48:14 TRACE nova.rpc.common     
> 2012-05-27 23:48:14
> CRITICAL nova [-] Timeout while waiting on RPC response.
> 2012-05-27 23:48:14 TRACE nova Traceback (most recent call last):
> 2012-05-27 23:48:14 TRACE nova   File "/usr/bin/nova-compute", line 49, in <module>   
> 2012-05-27 23:48:14 TRACE nova     service.wait()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 413, in wait  
> 2012-05-27 23:48:14 TRACE nova     _launcher.wait()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 131, in wait
> 2012-05-27 23:48:14 TRACE nova     service.wait()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
> 2012-05-27 23:48:14 TRACE nova     return self._exit_event.wait()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
> 2012-05-27 23:48:14 TRACE nova     return hubs.get_hub().switch()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
> 2012-05-27 23:48:14 TRACE nova     return self.greenlet.switch()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
> 2012-05-27 23:48:14 TRACE nova     result = function(*args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 101, in run_server
> 2012-05-27 23:48:14 TRACE nova     server.start()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/service.py", line 162, in start
> 2012-05-27 23:48:14 TRACE nova     self.manager.init_host()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 247, in init_host
> 2012-05-27 23:48:14 TRACE nova     self.reboot_instance(context, instance['uuid'])
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
> 2012-05-27 23:48:14 TRACE nova     return f(*args, **kw)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 153, in decorated_function
> 2012-05-27 23:48:14 TRACE nova     function(self, context, instance_uuid, *args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function
> 2012-05-27 23:48:14 TRACE nova     sys.exc_info())
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
> 2012-05-27 23:48:14 TRACE nova     self.gen.next()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
> 2012-05-27 23:48:14 TRACE nova     return function(self, context, instance_uuid, *args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 896, in reboot_instance
> 2012-05-27 23:48:14 TRACE nova     network_info = self._get_instance_nw_info(context, instance)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 313, in _get_instance_nw_info
> 2012-05-27 23:48:14 TRACE nova     instance)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 219, in get_instance_nw_info
> 2012-05-27 23:48:14 TRACE nova     'args': args})
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call
> 2012-05-27 23:48:14 TRACE nova     return _get_impl().call(context, topic, msg, timeout)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call
> 2012-05-27 23:48:14 TRACE nova     return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call
> 2012-05-27 23:48:14 TRACE nova     rv = list(rv)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__
> 2012-05-27 23:48:14 TRACE nova     self._iterator.next()
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in iterconsume
> 2012-05-27 23:48:14 TRACE nova     yield self.ensure(_error_callback, _consume)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in ensure
> 2012-05-27 23:48:14 TRACE nova     error_callback(e)
> 2012-05-27 23:48:14 TRACE nova   File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in _error_callback
> 2012-05-27 23:48:14 TRACE nova     raise rpc_common.Timeout()
> 2012-05-27 23:48:14 TRACE nova Timeout: Timeout while waiting on RPC response.
> 
> 
> Then after system boot a start nova-compute make everything working.
> 

Looks to me that you need to make sure the other side of that RPC
connection is up before nova-compute. I am not familiar with the specifics
of what Nova needs at startup, but I'd guess this is nova-api or keystone.
Thats a pretty easy thing to do in a single system (just mess with the
upstart jobs or init scripts) but across multiple systems, you'll need
some kind of orchestration layer, and even then modeling the dependencies
on the network with some other tool seems like something just begging
to break.

Instead, the timeout should just be multiple minutes during startup, and
the services should all be able to start in parallel if they are on the
same box. I always think of one of those HP EcoPOD that is pre-installed
with everything you need for OpenStack, and just shipped and then turned
on. You could spend a lot of time trying to get that order just right,
or you could just have everything extend their timeouts and get as far
as they can without contact with the other services.

nova-compute doesn't *know* that the other side is in error, it just
knows that it is not responding. This is not a problem with nova-compute,
so why should nova-compute fail so quickly? One could even argue that
nova-compute should wait *forever* for the other side. From an ops
standpoint, they're both "down", so why make the operations team take
two actions when the actual broken service recovers?


Follow ups

References