← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2052718] Re: Nova Compute Service status goes up and down abnormally

 

I dont belive this is in the scope of nova to fix.

the requirement to have consistent time synchronisation is well know and
it strongly feels like a problem that should be address in an
installation too not in code.

we mention that the controllers should be rujing shared service like ntp in the docs
https://docs.openstack.org/nova/latest/install/overview.html#controller


if you have not ensured your clocks are in sync as part of the installation process via ntp, ptp or another method then i would not consider OpenStack to be correctly installed.

** Changed in: nova
       Status: New => Opinion

** Changed in: nova
   Importance: Undecided => Wishlist

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2052718

Title:
  Compute service status still up with nagative elapsed time

Status in OpenStack Compute (nova):
  Opinion

Bug description:
  Hi community,

  When you type:
  $ openstack nova compute service list

  The status you will see "up" status but actually it is running wrong
  logic because elapsed time is a negative number. This is caused by the
  abs(elapsed) function turning it into a positive integer.

  Around the abs(elapsed) line of code ->
  https://github.com/openstack/nova/blob/stable/2023.2/nova/servicegroup/drivers/db.py

  ...
  ...
  def is_up(self, service_ref):
          ...
          ...
          # Timestamps in DB are UTC.
          elapsed = timeutils.delta_seconds(last_heartbeat, timeutils.utcnow())
          is_up = abs(elapsed) <= self.service_down_time
          if not is_up:
              LOG.debug('Seems service %(binary)s on host %(host)s is down. '
                        'Last heartbeat was %(lhb)s. Elapsed time is %(el)s',
                        {'binary': service_ref.get('binary'),
                         'host': service_ref.get('host'),
                         'lhb': str(last_heartbeat), 'el': str(elapsed)})
          return is_up
  ...
  ...

  service_down_time (threshold): 60s
  https://github.com/openstack/nova/blob/stable/2023.2/nova/conf/service.py#L40

  =========================== Bad result ===========================

  Example (1) bug:

  last_heartbeat: 10:00:00 AM
  now: 9:09:30 AM
  elapsed: -30(s)
  abs(-30s) < 60s
  ===> result: up

  Example (2) bug:

  last_heartbeat: 10:01:00 AM
  now: 9:09:58 AM
  elapsed: -62(s)
  abs(-30s) < 60s

  ===> result: down

  =========================== Expected result
  ===========================

  Example (1) good expectations:
  last_heartbeat: 10:00:00 AM
  now: 9:09:30 AM
  elapsed: -30(s) < 0
  ===> result: logging error and down

  Example (2) good expectations:

  last_heartbeat: 10:01:00 AM
  now: 9:09:58 AM
  elapsed: -62(s) < 0
  ===> result: logging error and down

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2052718/+subscriptions



References