yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #16363
[Bug 1331537] [NEW] nova service-list shows nova-compute as down and is required to be restarted frequently in order to provision new vms
Public bug reported:
Nova compute services in Openstack Havana go down frequently as listed
by "nova service-list" and requires to be restarted very frequently,
multiple times every day. All the compute nodes have the ntp times in
sync.
When a node shows down, it is not able to use those compute nodes for
launching new VMs and we quickly run out of compute resources. Hence our
workaround is to restart the Compute nodes on those servers hourly.
In the nova-compute node I've found the following error and they did match with the "Updated_at" field from nova service-list.
2014-06-07 00:21:15.690 511340 ERROR nova.servicegroup.drivers.db [-] model server went away
2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/dist-packages/nova/servicegroup/drivers/db.py", l ine 92, in _report_state
5804 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db report_count = service.service_ref['report_count'] + 1
5805 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db TypeError: 'NoneType' object has no attribute '__getitem__'
5806 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db
It looks like the ones that are shown as down haven't been able to update the database with the latest status and they did match with the Traceback seen above (2014-06-07 00:21:15.690) on at least two compute nodes that I have seen.
+------------------+------------------------+--------------+----------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+------------------------+--------------+----------+-------+----------------------------+-----------------+
| nova-compute | nova1| blabla | enabled | up | 2014-06-07T00:37:42.000000 | None |
| nova-compute | nova2 | blabla | enabled | down | 2014-06-07T00:21:05.000000 | None |
** Affects: nova
Importance: Undecided
Status: New
** Tags: compute
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1331537
Title:
nova service-list shows nova-compute as down and is required to be
restarted frequently in order to provision new vms
Status in OpenStack Compute (Nova):
New
Bug description:
Nova compute services in Openstack Havana go down frequently as listed
by "nova service-list" and requires to be restarted very frequently,
multiple times every day. All the compute nodes have the ntp times in
sync.
When a node shows down, it is not able to use those compute nodes for
launching new VMs and we quickly run out of compute resources. Hence
our workaround is to restart the Compute nodes on those servers
hourly.
In the nova-compute node I've found the following error and they did match with the "Updated_at" field from nova service-list.
2014-06-07 00:21:15.690 511340 ERROR nova.servicegroup.drivers.db [-] model server went away
2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db Traceback (most recent call last):
2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db File "/usr/lib/python2.7/dist-packages/nova/servicegroup/drivers/db.py", l ine 92, in _report_state
5804 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db report_count = service.service_ref['report_count'] + 1
5805 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db TypeError: 'NoneType' object has no attribute '__getitem__'
5806 2014-06-07 00:21:15.690 511340 TRACE nova.servicegroup.drivers.db
It looks like the ones that are shown as down haven't been able to update the database with the latest status and they did match with the Traceback seen above (2014-06-07 00:21:15.690) on at least two compute nodes that I have seen.
+------------------+------------------------+--------------+----------+-------+----------------------------+-----------------+
| Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+------------------+------------------------+--------------+----------+-------+----------------------------+-----------------+
| nova-compute | nova1| blabla | enabled | up | 2014-06-07T00:37:42.000000 | None |
| nova-compute | nova2 | blabla | enabled | down | 2014-06-07T00:21:05.000000 | None |
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1331537/+subscriptions
Follow ups
References