yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #73925
[Bug 1781710] Re: ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity failing with "Servers are on the same host"
Reviewed: https://review.openstack.org/583347
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c22b53c2481bac518a6b32cdee7b7df23d91251e
Submitter: Zuul
Branch: master
commit c22b53c2481bac518a6b32cdee7b7df23d91251e
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Tue Jul 17 17:43:37 2018 -0400
Update RequestSpec.instance_uuid during scheduling
Before change I4b67ec9dd4ce846a704d0f75ad64c41e693de0fb
the ServerGroupAntiAffinityFilter did not rely on the
HostState.instances dict to determine **within the same
request** how many members of the same anti-affinity
group were on a given host. In fact, at that time, the
HostState.instances dict wasn't updated between filter
runs in the same multi-create request. That was fixed with
change Iacc636fa8a59a9e8670a8d683c10bdbb0dc8237b so that
as we select a host for each group member being created
within a single request, we also update the HostState.instances
dict so the ServerGroupAntiAffinityFilter would have
accurate tracking of which group members were on which
hosts.
However, that did not account for a wrinkle in the filter
added in change Ie016f59f5b98bb9c70b3e33556bd747f79fc77bd
which is needed to allow resizing a server to the same host
when that server is in an anti-affinity group. That wrinkle,
combined with the fact the RequestSpec the filter is acting
upon for a given instance in a multi-create request might
not actually have the same instance_uuid can cause the filter
to think it's in a resize situation and accept a host which
already has another member from the same anti-affinity group
on it, which breaks the anti-affinity policy.
For background, during a multi-create request, we create a
RequestSpec per instance being created, but conductor only
sends the first RequestSpec for the first instance to the
scheduler. This means RequestSpec.num_instances can be >1
and we can be processing the Nth instance in the list during
scheduling but working on a RequestSpec for the first instance.
That is what breaks the resize check in ServerGroupAntiAffinityFilter
with regard to multi-create.
To resolve this, we update the RequestSpec.instance_uuid when
filtering hosts for a given instance but we don't persist that
change to the RequestSpec.
This is a bit clunky, but it kind of comes with the territory of
how we hack scheduling requests together using a single RequestSpec
for multi-create requests. An alternative to this is found in change
I0dd1fa5a70ac169efd509a50b5d69ee5deb8deb7 where we rely on the
RequestSpec.num_instances field to determine if we're in a multi-create
situation, but that has its own flaws because num_instances is
persisted with the RequestSpec which might cause us to re-introduce
bug 1558532 if we're not careful. In that case we'd have to either
(1) stop persisting RequestSpec.num_instances or (2) temporarily
unset it like we do using RequestSpec.reset_forced_destinations()
during move operations.
Co-Authored-By: Sean Mooney <work@xxxxxxxxxxxxxxx>
Closes-Bug: #1781710
Change-Id: Icba22060cb379ebd5e906981ec283667350b8c5a
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781710
Title:
ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity
failing with "Servers are on the same host"
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Started seeing this recently which looks like a regression:
http://logs.openstack.org/44/564444/14/check/neutron-tempest-
multinode-full/dba40b9/job-output.txt.gz#_2018-07-13_19_53_15_275866
2018-07-13 19:53:15.275866 | primary | {1} tempest.api.compute.admin.test_servers_on_multinodes.ServersOnMultiNodesTest.test_create_server_with_scheduler_hint_group_anti_affinity [7.164074s] ... FAILED
2018-07-13 19:53:15.275944 | primary |
2018-07-13 19:53:15.276012 | primary | Captured traceback:
2018-07-13 19:53:15.276075 | primary | ~~~~~~~~~~~~~~~~~~~
2018-07-13 19:53:15.276171 | primary | Traceback (most recent call last):
2018-07-13 19:53:15.276452 | primary | File "tempest/api/compute/admin/test_servers_on_multinodes.py", line 115, in test_create_server_with_scheduler_hint_group_anti_affinity
2018-07-13 19:53:15.276598 | primary | 'Servers are on the same host: %s' % hosts)
2018-07-13 19:53:15.276857 | primary | File "/opt/stack/new/tempest/.tox/tempest/local/lib/python2.7/site-packages/unittest2/case.py", line 845, in assertNotEqual
2018-07-13 19:53:15.276965 | primary | raise self.failureException(msg)
2018-07-13 19:53:15.277830 | primary | AssertionError: u'ubuntu-xenial-rax-dfw-0000714118' == u'ubuntu-xenial-rax-dfw-0000714118' : Servers are on the same host: {u'c166e283-477c-4ecf-9c1c-2dcd731a6d6a': u'ubuntu-xenial-rax-dfw-0000714118', u'6eb63e79-122e-45f9-931f-0750047116d1': u'ubuntu-xenial-rax-dfw-0000714118'}
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22AssertionError%5C%22%20AND%20message%3A%5C%22Servers%20are%20on%20the%20same%20host%3A%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d
According to logstash, it looks like this started around July 9.
These changes merged to nova on July 9 but shouldn't impact this
scheduling behavior yet:
https://github.com/openstack/nova/commit/57b0bb374963bdbf0aef910feaccb8f536641c41
https://github.com/openstack/nova/commit/afc7650e64753ab7687ae2c4f2714d4bb78a4e5a
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1781710/+subscriptions
References