yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #75326
[Bug 1798806] [NEW] Race condition between RT and scheduler
Public bug reported:
The HostState object which is used by the scheduler is using the 'stats'
property of the compute node to derive its own values, e.g. :
self.stats = compute.stats or {}
self.num_instances = int(self.stats.get('num_instances', 0))
self.num_io_ops = int(self.stats.get('io_workload', 0))
self.failed_builds = int(self.stats.get('failed_builds', 0))
These values are used for both filtering and weighing compute hosts.
However, the 'stats' property of the compute node is cleared during the
periodic update_available_resources() and populated again. The clearing
occurs in RT._copy_resources() and it preserves only the old value of
'failed_builds'. This creates a race condition between RT and scheduler
which may result into populating wrong values for 'num_io_ops' and
'num_instances' into the HostState object and thus leading to incorrect
scheduling decisions.
** Affects: nova
Importance: High
Assignee: Radoslav Gerganov (rgerganov)
Status: In Progress
** Tags: scheduler
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1798806
Title:
Race condition between RT and scheduler
Status in OpenStack Compute (nova):
In Progress
Bug description:
The HostState object which is used by the scheduler is using the
'stats' property of the compute node to derive its own values, e.g. :
self.stats = compute.stats or {}
self.num_instances = int(self.stats.get('num_instances', 0))
self.num_io_ops = int(self.stats.get('io_workload', 0))
self.failed_builds = int(self.stats.get('failed_builds', 0))
These values are used for both filtering and weighing compute hosts.
However, the 'stats' property of the compute node is cleared during
the periodic update_available_resources() and populated again. The
clearing occurs in RT._copy_resources() and it preserves only the old
value of 'failed_builds'. This creates a race condition between RT and
scheduler which may result into populating wrong values for
'num_io_ops' and 'num_instances' into the HostState object and thus
leading to incorrect scheduling decisions.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1798806/+subscriptions
Follow ups