yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93440
[Bug 2052718] [NEW] Nova Compute Service status goes up and down abnormally
Public bug reported:
Hi community,
When you type:
$ watch openstack nova compute service list
You will see the status of components such as "nova-conductor", "nova-
scheduler" and "nova-compute" continuously jump between "up" and "down",
this is not normal and it has absolutely no end. This happens and a
chain of functions are affected such as live-migrate, creating virtual
machines,... while checking, the compute server still works.
After some debugging I discovered that the current time "now" retrieved
from the system is smaller than the past time "last_heartbeat"!!! And
the mistake of not pointing this out in the source code cost me a lot of
time to understand. In fact, as soon as the current time is received,
there should be a comparison calculation with the last heartbeat. If it
is small, we need to log the warning to easily configure time
synchronization for the cluster.
Around the abs(elapsed) line of code ->
https://github.com/openstack/nova/blob/stable/2023.2/nova/servicegroup/drivers/db.py
...
...
def is_up(self, service_ref):
...
...
# Timestamps in DB are UTC.
elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
is_up = abs(elapsed) <= self.service_down_time
if not is_up:
LOG.debug('Seems service %(binary)s on host %(host)s is down. '
'Last heartbeat was %(lhb)s. Elapsed time is %(el)s',
{'binary': service_ref.get('binary'),
'host': service_ref.get('host'),
'lhb': str(last_heartbeat), 'el': str(elapsed)})
return is_up
...
...
** Affects: nova
Importance: Undecided
Status: New
** Attachment added: "Up/down continuously"
https://bugs.launchpad.net/bugs/2052718/+attachment/5745271/+files/MicrosoftTeams-image.png
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2052718
Title:
Nova Compute Service status goes up and down abnormally
Status in OpenStack Compute (nova):
New
Bug description:
Hi community,
When you type:
$ watch openstack nova compute service list
You will see the status of components such as "nova-conductor", "nova-
scheduler" and "nova-compute" continuously jump between "up" and
"down", this is not normal and it has absolutely no end. This happens
and a chain of functions are affected such as live-migrate, creating
virtual machines,... while checking, the compute server still works.
After some debugging I discovered that the current time "now"
retrieved from the system is smaller than the past time
"last_heartbeat"!!! And the mistake of not pointing this out in the
source code cost me a lot of time to understand. In fact, as soon as
the current time is received, there should be a comparison calculation
with the last heartbeat. If it is small, we need to log the warning to
easily configure time synchronization for the cluster.
Around the abs(elapsed) line of code ->
https://github.com/openstack/nova/blob/stable/2023.2/nova/servicegroup/drivers/db.py
...
...
def is_up(self, service_ref):
...
...
# Timestamps in DB are UTC.
elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
is_up = abs(elapsed) <= self.service_down_time
if not is_up:
LOG.debug('Seems service %(binary)s on host %(host)s is down. '
'Last heartbeat was %(lhb)s. Elapsed time is %(el)s',
{'binary': service_ref.get('binary'),
'host': service_ref.get('host'),
'lhb': str(last_heartbeat), 'el': str(elapsed)})
return is_up
...
...
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2052718/+subscriptions
Follow ups