openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #12406
Re: Nova-compute doesn't start on reboot, only manually
Excerpts from Alessandro Tagliapietra's message of 2012-05-28 01:41:08 -0700:
> Hello, i've installed openstack following the ubuntu 12.04 deploy guide, only problem is that nova-compute has to be started manually, by default it doesn't start on boot, this is the error log:
>
> 2012-05-27 23:47:14 INFO nova.rpc.common [req-46624af9-9d2a-4901-b635-66f557d3b54c None None] Connected to AMQP server on 10.8.0.1:5672
> 2012-05-27 23:48:14 ERROR nova.rpc.common [req-46624af9-9d2a-4901-b635-66f557d3b54c None None] Timed out waiting for RPC response: timed out
So this is a reasonable, but still I think, not long enough timeout of
one minute...
> 2012-05-27 23:48:14 TRACE nova.rpc.common Traceback (most recent call last):
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 490, in ensure
> 2012-05-27 23:48:14 TRACE nova.rpc.common return method(*args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 567, in _consume
> 2012-05-27 23:48:14 TRACE nova.rpc.common return self.connection.drain_events(timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/connection.py", line 175, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common return self.transport.drain_events(self.connection, **kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 238, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common return connection.drain_events(**kwargs)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 57, in drain_events
> 2012-05-27 23:48:14 TRACE nova.rpc.common return self.wait_multi(self.channels.values(), timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 63, in wait_multi
> 2012-05-27 23:48:14 TRACE nova.rpc.common chanmap.keys(), allowed_methods, timeout=timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 120, in _wait_multiple
> 2012-05-27 23:48:14 TRACE nova.rpc.common channel, method_sig, args, content = read_timeout(timeout)
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/kombu/transport/pyamqplib.py", line 94, in read_timeout
> 2012-05-27 23:48:14 TRACE nova.rpc.common return self.method_reader.read_method()
> 2012-05-27 23:48:14 TRACE nova.rpc.common File "/usr/lib/python2.7/dist-packages/amqplib/client_0_8/method_framing.py", line 221, in read_method
> 2012-05-27 23:48:14 TRACE nova.rpc.common raise m
> 2012-05-27 23:48:14 TRACE nova.rpc.common timeout: timed out
> 2012-05-27 23:48:14 TRACE nova.rpc.common
> 2012-05-27 23:48:14
> CRITICAL nova [-] Timeout while waiting on RPC response.
> 2012-05-27 23:48:14 TRACE nova Traceback (most recent call last):
> 2012-05-27 23:48:14 TRACE nova File "/usr/bin/nova-compute", line 49, in <module>
> 2012-05-27 23:48:14 TRACE nova service.wait()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 413, in wait
> 2012-05-27 23:48:14 TRACE nova _launcher.wait()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 131, in wait
> 2012-05-27 23:48:14 TRACE nova service.wait()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
> 2012-05-27 23:48:14 TRACE nova return self._exit_event.wait()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
> 2012-05-27 23:48:14 TRACE nova return hubs.get_hub().switch()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
> 2012-05-27 23:48:14 TRACE nova return self.greenlet.switch()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
> 2012-05-27 23:48:14 TRACE nova result = function(*args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 101, in run_server
> 2012-05-27 23:48:14 TRACE nova server.start()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 162, in start
> 2012-05-27 23:48:14 TRACE nova self.manager.init_host()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 247, in init_host
> 2012-05-27 23:48:14 TRACE nova self.reboot_instance(context, instance['uuid'])
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/exception.py", line 114, in wrapped
> 2012-05-27 23:48:14 TRACE nova return f(*args, **kw)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 153, in decorated_function
> 2012-05-27 23:48:14 TRACE nova function(self, context, instance_uuid, *args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 177, in decorated_function
> 2012-05-27 23:48:14 TRACE nova sys.exc_info())
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
> 2012-05-27 23:48:14 TRACE nova self.gen.next()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 171, in decorated_function
> 2012-05-27 23:48:14 TRACE nova return function(self, context, instance_uuid, *args, **kwargs)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 896, in reboot_instance
> 2012-05-27 23:48:14 TRACE nova network_info = self._get_instance_nw_info(context, instance)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 313, in _get_instance_nw_info
> 2012-05-27 23:48:14 TRACE nova instance)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/network/api.py", line 219, in get_instance_nw_info
> 2012-05-27 23:48:14 TRACE nova 'args': args})
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/__init__.py", line 68, in call
> 2012-05-27 23:48:14 TRACE nova return _get_impl().call(context, topic, msg, timeout)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 674, in call
> 2012-05-27 23:48:14 TRACE nova return rpc_amqp.call(context, topic, msg, timeout, Connection.pool)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 338, in call
> 2012-05-27 23:48:14 TRACE nova rv = list(rv)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/amqp.py", line 299, in __iter__
> 2012-05-27 23:48:14 TRACE nova self._iterator.next()
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 572, in iterconsume
> 2012-05-27 23:48:14 TRACE nova yield self.ensure(_error_callback, _consume)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 503, in ensure
> 2012-05-27 23:48:14 TRACE nova error_callback(e)
> 2012-05-27 23:48:14 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/rpc/impl_kombu.py", line 553, in _error_callback
> 2012-05-27 23:48:14 TRACE nova raise rpc_common.Timeout()
> 2012-05-27 23:48:14 TRACE nova Timeout: Timeout while waiting on RPC response.
>
>
> Then after system boot a start nova-compute make everything working.
>
Looks to me that you need to make sure the other side of that RPC
connection is up before nova-compute. I am not familiar with the specifics
of what Nova needs at startup, but I'd guess this is nova-api or keystone.
Thats a pretty easy thing to do in a single system (just mess with the
upstart jobs or init scripts) but across multiple systems, you'll need
some kind of orchestration layer, and even then modeling the dependencies
on the network with some other tool seems like something just begging
to break.
Instead, the timeout should just be multiple minutes during startup, and
the services should all be able to start in parallel if they are on the
same box. I always think of one of those HP EcoPOD that is pre-installed
with everything you need for OpenStack, and just shipped and then turned
on. You could spend a lot of time trying to get that order just right,
or you could just have everything extend their timeouts and get as far
as they can without contact with the other services.
nova-compute doesn't *know* that the other side is in error, it just
knows that it is not responding. This is not a problem with nova-compute,
so why should nova-compute fail so quickly? One could even argue that
nova-compute should wait *forever* for the other side. From an ops
standpoint, they're both "down", so why make the operations team take
two actions when the actual broken service recovers?
Follow ups
References