yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76108
[Bug 1807044] [NEW] nova-api startup does not scan cells looking for minimum nova-compute service version
Public bug reported:
This CI job failed devstack setup because nova-api took longer than 60
seconds to start (it took 64 seconds):
http://logs.openstack.org/01/619701/5/gate/tempest-
slow/2bb461b/controller/logs/screen-n-api.txt.gz
Looking at what could be taking time in there, it was noticed that this
is logged a lot:
Dec 05 20:14:00.919520 ubuntu-xenial-ovh-bhs1-0000959981
devstack@n-api.service[23459]: DEBUG nova.compute.rpcapi [None req-
dfdfad07-2ff4-43ed-9f67-2acd59687e0c None None] Not caching compute RPC
version_cap, because min service_version is 0. Please ensure a nova-
compute service has been started. Defaulting to current version.
{{(pid=23462) _determine_version_cap
/opt/stack/nova/nova/compute/rpcapi.py:397}}
That's coming from here:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/rpcapi.py#L396
Which is when the compute rpcapi client is initialized, which happens
when nova.compute.api.API is initialized:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/api.py#L266
Which happens for most of the API extensions, e.g.:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/api/openstack/compute/servers.py#L78
So that init and DB query happens num_workers * num_extensions times (we
have 2 workers in this case and it looks like there are at least 29
instantiations of the compute API code in the extensions).
The bigger problem is that in this case, nova-api is configured to hit
cell0 for the [database]/connection:
http://logs.openstack.org/01/619701/5/gate/tempest-
slow/2bb461b/controller/logs/etc/nova/nova_conf.txt.gz
[database]
connection = mysql+pymysql://root:secretdatabase@127.0.0.1/nova_cell0?charset=utf8
And there will not be nova-compute services in the cell0 database (if
configured properly).
So this query is always going to return 0, at least for devstack:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/rpcapi.py#L387
We should really be scanning the cells to get the minimum nova-compute
version using this:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/objects/service.py#L481
But even on first startup of nova-api before any computes are started
and registered with a cell, that initial query will return 0 which means
we won't cache the result and continue to do that query and log that
message for extensions * workers.
So there are really kind of two issues here:
1. We're not iterating cells properly for that version check. This is
the more important issue.
2. We're needlessly doing this query on initial startup which slows down
the startup (and contributes to timeouts the devstack jobs on slow
nodes) and lots of excessive logging.
** Affects: nova
Importance: Medium
Status: Confirmed
** Tags: api cells performance
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1807044
Title:
nova-api startup does not scan cells looking for minimum nova-compute
service version
Status in OpenStack Compute (nova):
Confirmed
Bug description:
This CI job failed devstack setup because nova-api took longer than 60
seconds to start (it took 64 seconds):
http://logs.openstack.org/01/619701/5/gate/tempest-
slow/2bb461b/controller/logs/screen-n-api.txt.gz
Looking at what could be taking time in there, it was noticed that
this is logged a lot:
Dec 05 20:14:00.919520 ubuntu-xenial-ovh-bhs1-0000959981
devstack@n-api.service[23459]: DEBUG nova.compute.rpcapi [None req-
dfdfad07-2ff4-43ed-9f67-2acd59687e0c None None] Not caching compute
RPC version_cap, because min service_version is 0. Please ensure a
nova-compute service has been started. Defaulting to current version.
{{(pid=23462) _determine_version_cap
/opt/stack/nova/nova/compute/rpcapi.py:397}}
That's coming from here:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/rpcapi.py#L396
Which is when the compute rpcapi client is initialized, which happens
when nova.compute.api.API is initialized:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/api.py#L266
Which happens for most of the API extensions, e.g.:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/api/openstack/compute/servers.py#L78
So that init and DB query happens num_workers * num_extensions times
(we have 2 workers in this case and it looks like there are at least
29 instantiations of the compute API code in the extensions).
The bigger problem is that in this case, nova-api is configured to hit
cell0 for the [database]/connection:
http://logs.openstack.org/01/619701/5/gate/tempest-
slow/2bb461b/controller/logs/etc/nova/nova_conf.txt.gz
[database]
connection = mysql+pymysql://root:secretdatabase@127.0.0.1/nova_cell0?charset=utf8
And there will not be nova-compute services in the cell0 database (if
configured properly).
So this query is always going to return 0, at least for devstack:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/compute/rpcapi.py#L387
We should really be scanning the cells to get the minimum nova-compute
version using this:
https://github.com/openstack/nova/blob/aed3e240138ad84352688611246363339414c98f/nova/objects/service.py#L481
But even on first startup of nova-api before any computes are started
and registered with a cell, that initial query will return 0 which
means we won't cache the result and continue to do that query and log
that message for extensions * workers.
So there are really kind of two issues here:
1. We're not iterating cells properly for that version check. This is
the more important issue.
2. We're needlessly doing this query on initial startup which slows
down the startup (and contributes to timeouts the devstack jobs on
slow nodes) and lots of excessive logging.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1807044/+subscriptions
Follow ups