yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #93889
[Bug 1542491] Re: Scheduler update_aggregates race causes incorrect aggregate information
** Changed in: nova
Status: Confirmed => Opinion
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1542491
Title:
Scheduler update_aggregates race causes incorrect aggregate
information
Status in OpenStack Compute (nova):
Opinion
Status in Ubuntu:
Invalid
Bug description:
It appears that if nova-api receives simultaneous requests to add a
server to a host aggregate, then a race occurs that can lead to nova-
scheduler having incorrect aggregate information in memory.
One observed effect of this is that sometimes nova-scheduler will
think a smaller number of hosts are a member of the aggregate than is
in the nova database and will filter out a host that should not be
filtered.
Restarting nova-scheduler fixes the issue, as it reloads the aggregate
information on startup.
Nova package versions: 1:2015.1.2-0ubuntu2~cloud0
Reproduce steps:
Create a new os-aggregate and then populate an os-aggregate with
simultaneous API POSTs, note timestamps:
2016-02-04 20:17:08.538 13648 INFO nova.osapi_compute.wsgi.server [req-d07a006e-134a-46d8-9815-6becec5b185c 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] 10.120.13.3 "POST /v2.1/326d453c2bd440b4a7160489b632d0a8/os-aggregates HTTP/1.1" status: 200 len: 439 time: 0.1865470
2016-02-04 20:17:09.204 13648 INFO nova.osapi_compute.wsgi.server [req-a0402297-9337-46d6-96d2-066e230e45e1 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] 10.120.13.2 "POST /v2.1/326d453c2bd440b4a7160489b632d0a8/os-aggregates/1/action HTTP/1.1" status: 200 len: 506 time: 0.2995598
2016-02-04 20:17:09.243 13648 INFO nova.osapi_compute.wsgi.server [req-0f543525-c34e-418a-91a9-894d714ee95b 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] 10.120.13.2 "POST /v2.1/326d453c2bd440b4a7160489b632d0a8/os-aggregates/1/action HTTP/1.1" status: 200 len: 519 time: 0.3140590
2016-02-04 20:17:09.273 13649 INFO nova.osapi_compute.wsgi.server [req-2f8d80b0-726f-4126-a8ab-a2eae3f1a385 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] 10.120.13.2 "POST /v2.1/326d453c2bd440b4a7160489b632d0a8/os-aggregates/1/action HTTP/1.1" status: 200 len: 506 time: 0.3759601
2016-02-04 20:17:09.275 13649 INFO nova.osapi_compute.wsgi.server [req-80ab6c86-e521-4bf0-ab67-4de9d0eccdd3 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] 10.120.13.1 "POST /v2.1/326d453c2bd440b4a7160489b632d0a8/os-aggregates/1/action HTTP/1.1" status: 200 len: 506 time: 0.3433032
Schedule a VM
Expected Result:
nova-scheduler Availability Zone filter returns all members of the aggregate
Actual Result:
nova-scheduler believes there is only one hypervisor in the aggregate. The number will vary as it is a race:
2016-02-05 07:48:04.411 13600 DEBUG nova.filters [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Starting with 4 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:70
2016-02-05 07:48:04.411 13600 DEBUG nova.filters [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Filter RetryFilter returned 4 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:84
2016-02-05 07:48:04.412 13600 DEBUG nova.scheduler.filters.availability_zone_filter [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Availability Zone 'temp' requested. (oshv0, oshv0) ram:122691 disk:13404160 io_ops:0 instances:0 has AZs: nova host_passes /usr/lib/python2.7/dist-packages/nova/scheduler/filters/availability_zone_filter.py:62
2016-02-05 07:48:04.412 13600 DEBUG nova.scheduler.filters.availability_zone_filter [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Availability Zone 'temp' requested. (oshv2, oshv2) ram:122691 disk:13403136 io_ops:0 instances:0 has AZs: nova host_passes /usr/lib/python2.7/dist-packages/nova/scheduler/filters/availability_zone_filter.py:62
2016-02-05 07:48:04.413 13600 DEBUG nova.scheduler.filters.availability_zone_filter [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Availability Zone 'temp' requested. (oshv1, oshv1) ram:122691 disk:13404160 io_ops:0 instances:0 has AZs: nova host_passes /usr/lib/python2.7/dist-packages/nova/scheduler/filters/availability_zone_filter.py:62
2016-02-05 07:48:04.413 13600 DEBUG nova.filters [req-c24338b5-a3b8-4864-8140-04ea6fbcf68f 41812fc01c6549ac8ed15c6dab05c670 326d453c2bd440b4a7160489b632d0a8 - - -] Filter AvailabilityZoneFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:84
Nova API calls show the correct number of members.
I suspect that it is caused by the simultaneous processing or out-of-order receipt of update_aggregates RPC calls.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1542491/+subscriptions
References