yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #16491
[Bug 1332036] Re: Instance status would never changed from BUILD after rabbitmq restarted
Hi Andrea, sorry for lacking details.
I haven't thought this behavior related config values.
Please see following details:
======================
1) When rabbitmq received SIGKILL, nova-compute dumped ERROR, but this behavior is normal
2014-06-23 11:23:11.691 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to consume message from queue: Socket closed
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 639, in ensure
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit return method(*args, **kwargs)
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 718, in _consume
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit return self.connection.drain_events(timeout=timeout)
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/connection.py", line 279, in drain_events
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit return self.transport.drain_events(self.connection, **kwargs)
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/kombu/transport/pyamqp.py", line 91, in drain_events
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit return connection.drain_events(**kwargs)
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 299, in drain_events
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit chanmap, None, timeout=timeout,
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 362, in _wait_multiple
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit channel, method_sig, args, content = read_timeout(timeout)
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/connection.py", line 326, in read_timeout
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit return self.method_reader.read_method()
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/local/lib/python2.7/dist-packages/amqp/method_framing.py", line 189, in read_method
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit raise m
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit IOError: Socket closed
2014-06-23 11:23:11.691 TRACE oslo.messaging._drivers.impl_rabbit
2) nova-compute was starting reconnect immediately, it's good
2014-06-23 11:23:12.414 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.10.221:5672
2014-06-23 11:23:12.414 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2014-06-23 11:23:12.844 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.10.221:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
2014-06-23 11:23:13.428 ERROR oslo.messaging._drivers.impl_rabbit [-] AMQP server on 192.168.10.221:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.
3) after re-started rabbitmq, nova-compute re-connected rabbitmq, it's
very nice
2014-06-23 11:24:25.653 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.10.221:5672
2014-06-23 11:24:25.653 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2014-06-23 11:24:25.789 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.10.221:5672
2014-06-23 11:24:26.706 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.10.221:5672
2014-06-23 11:24:31.916 INFO oslo.messaging._drivers.impl_rabbit [-] Reconnecting to AMQP server on 192.168.10.221:5672
2014-06-23 11:24:31.917 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 1.0 seconds...
2014-06-23 11:24:32.947 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.10.221:5672
4) But booting instance is freeze with following log
2014-06-23 11:24:34.012 WARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 63.325336 sec
2014-06-23 11:25:05.777 DEBUG nova.openstack.common.lockutils [req-b23de1c3-55d8-4d78-9386-621b907f2f25 admin admin] Got semaphore "7b68d649-cbf4-4e97-aa66-bf9b88199db7" from (pid=3704) lock /opt/stack/nova/nova/openstack/common/lockutils.py:168
2014-06-23 11:25:05.777 DEBUG nova.openstack.common.lockutils [req-b23de1c3-55d8-4d78-9386-621b907f2f25 admin admin] Got semaphore / lock "do_run_instance" from (pid=3704) inner /opt/stack/nova/nova/openstack/common/lockutils.py:248
2014-06-23 11:25:05.780 AUDIT nova.compute.manager [req-b23de1c3-55d8-4d78-9386-621b907f2f25 admin admin] [instance: 7b68d649-cbf4-4e97-aa66-bf9b88199db7] Starting instance...
5) VM status and nova-compute's log would never changed
======================
I thought, this behavior problem is "VM status never changed after
rabbitmq connectivity re-established".
BUT, I noticed that this behavior never occur currently master(I tested commitid: 73d95d2817452f78c468ce162265be431f625c8a).
So I change bug report's status to "invalid".
** Changed in: nova
Status: New => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1332036
Title:
Instance status would never changed from BUILD after rabbitmq
restarted
Status in OpenStack Compute (Nova):
Invalid
Bug description:
[Issue]
I'm checking the behavior when system failure cases.
Instance status never changed from BUILD after rabbitmq restarted.
[How to reproduce]
1) boot services that contained nova and rabbit.
2) kill rabbitmq service for simulating system failure.
$ ps aux|grep beam|grep -v grep
rabbitmq 15816 2.4 1.3 146496 27048 ? Sl 20:05 0:01 /usr/lib/erlang/erts-5.10.4/bin/beam -W w -K true -A30 -P 1048576 -- -root /usr/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.2.4/sbin/../ebin -noshell -noinput -s rabbit boot -sname rabbit@interdev -boot start_sasl -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false -rabbit error_logger {file,"/var/log/rabbitmq/rabbit@xxxxxxxxxxxx"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@xxxxxxxxxxxxxxxxx"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.2.4/sbin/../plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@interdev-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@interdev"
$ sudo kill -SIGKILL 15816
3) restart rabbitmq service.
$ sudo /etc/init.d/rabbitmq-server start
* Starting message broker rabbitmq-server [ OK ]
4) request to boot instance
$ nova boot --flavor 1 --image 41793321-51b1-413a-a69f-0c25d7a63b22 aaa
+--------------------------------------+-------------------------------------------------+
| Property | Value |
+--------------------------------------+-------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | nova |
| OS-EXT-SRV-ATTR:host | - |
| OS-EXT-SRV-ATTR:hypervisor_hostname | - |
| OS-EXT-SRV-ATTR:instance_name | instance-00000005 |
| OS-EXT-STS:power_state | 0 |
| OS-EXT-STS:task_state | scheduling |
| OS-EXT-STS:vm_state | building |
| OS-SRV-USG:launched_at | - |
| OS-SRV-USG:terminated_at | - |
| accessIPv4 | |
| accessIPv6 | |
| adminPass | 2gV49MZNHLQm |
| config_drive | |
| created | 2014-06-19T11:06:51Z |
| flavor | m1.tiny (1) |
| hostId | |
| id | 157f3d5d-5d26-4a55-9d7c-ab0dc3d9fdc9 |
| image | CorePlus (41793321-51b1-413a-a69f-0c25d7a63b22) |
| key_name | - |
| metadata | {} |
| name | aaa |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| security_groups | default |
| status | BUILD |
| tenant_id | 5924ee51bb6945f39dc7010687ef43af |
| updated | 2014-06-19T11:06:54Z |
| user_id | 35b1c3ef8b9d442cad696d049f7b2041 |
+--------------------------------------+-------------------------------------------------+
5) however, instance status never changed from BUILD/NOSTATE
$ nova list
+--------------------------------------+------+--------+------------+-------------+----------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------+--------+------------+-------------+----------+
| 157f3d5d-5d26-4a55-9d7c-ab0dc3d9fdc9 | aaa | BUILD | - | NOSTATE | |
+--------------------------------------+------+--------+------------+-------------+----------+
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1332036/+subscriptions
References