yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87397
[Bug 1946753] [NEW] scheduler doesn't update weights
Public bug reported:
This is for train release, but I don't see that logic has changed since
in this matter.
When scheduling number of instances at the same time weights do not get updated for subsequent instances.
Seems like _consume_selected_host() function has no effect as when scheduler starts scheduling another instance, at the beginning of this process it gets host states directly from compute nodes.
Problem is that host state update at compute only happens once instance starts building which seems in many cases to late. Consequence of that is that next compute nodes for next instance is weighed with not accurate weights.
Result is that distribution of the VMs accross compute nodes is not as expected.
I managed to reproduce that problem even with creating just two
instances at the same time.
In one test with 50 instances observed 17 instances scheduled based on
weights values same as for first of them.
Below are logs excerpt with comments from nova-scheduler.log to depict what I mean.
This example focuses on RamWeigher.
In this case two instances were created at the same time with openstack cli.
First instance is being scheduled
2021-10-11 15:58:18.484 20 DEBUG nova.scheduler.manager [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] Starting to schedule for instances: ['d95ba6be-7a19-4d70-9280-27a367f7b102'] select_destinations /usr/lib/python3.6/site-packages/nova/scheduler/manager.py:133
Selected host for first instance with weights used for that selection
2021-10-11 15:58:18.853 20 DEBUG nova.scheduler.filter_scheduler [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: d95ba6be-7a19-4d70-9280-27a367f7b102] Selected host: (vcmp1, vcmp1) ram: 7328MB disk: 38912MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:354
Selected host for first instance with weights updated/reduced by amounts allocated for that instance.
This weights should be used for scheduling next instance. In particular ram: 6816MB for RAMweigher.
This log line is result of extra LOG.debug(...) added to the code(notice diffrent line number 357 at the end)
2021-10-11 15:58:18.856 20 DEBUG nova.scheduler.filter_scheduler [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: d95ba6be-7a19-4d70-9280-27a367f7b102] Selected host after consume_from_request: (vcmp1, vcmp1) ram: 6816MB disk: 37888MB io_ops: 1 instances: 1 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:357
Second instance is being scheduled
2021-10-11 15:58:19.487 22 DEBUG nova.scheduler.manager [req-7c2dff56-2a94-491e-baab-6080524aa592 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] Starting to schedule for instances: ['92e67f44-898b-4a07-a841-b2ffd296d089'] select_destinations /usr/lib/python3.6/site-packages/nova/scheduler/manager.py:133
Selected host for second instance with weights used for that selection.
It can be seen that weight for RAMweigher is 7328MB. Same as for first instance.
Should be 6816MB instead as when just after _consume_selected_host method was executed
2021-10-11 15:58:19.772 22 DEBUG nova.scheduler.filter_scheduler [req-7c2dff56-2a94-491e-baab-6080524aa592 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: 92e67f44-898b-4a07-a841-b2ffd296d089] Selected host: (vcmp1, vcmp1) ram: 7328MB disk: 38912MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:354
** Affects: nova
Importance: Undecided
Status: New
** Tags: nova-scheduler
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1946753
Title:
scheduler doesn't update weights
Status in OpenStack Compute (nova):
New
Bug description:
This is for train release, but I don't see that logic has changed
since in this matter.
When scheduling number of instances at the same time weights do not get updated for subsequent instances.
Seems like _consume_selected_host() function has no effect as when scheduler starts scheduling another instance, at the beginning of this process it gets host states directly from compute nodes.
Problem is that host state update at compute only happens once instance starts building which seems in many cases to late. Consequence of that is that next compute nodes for next instance is weighed with not accurate weights.
Result is that distribution of the VMs accross compute nodes is not as expected.
I managed to reproduce that problem even with creating just two
instances at the same time.
In one test with 50 instances observed 17 instances scheduled based on
weights values same as for first of them.
Below are logs excerpt with comments from nova-scheduler.log to depict what I mean.
This example focuses on RamWeigher.
In this case two instances were created at the same time with openstack cli.
First instance is being scheduled
2021-10-11 15:58:18.484 20 DEBUG nova.scheduler.manager [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] Starting to schedule for instances: ['d95ba6be-7a19-4d70-9280-27a367f7b102'] select_destinations /usr/lib/python3.6/site-packages/nova/scheduler/manager.py:133
Selected host for first instance with weights used for that selection
2021-10-11 15:58:18.853 20 DEBUG nova.scheduler.filter_scheduler [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: d95ba6be-7a19-4d70-9280-27a367f7b102] Selected host: (vcmp1, vcmp1) ram: 7328MB disk: 38912MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:354
Selected host for first instance with weights updated/reduced by amounts allocated for that instance.
This weights should be used for scheduling next instance. In particular ram: 6816MB for RAMweigher.
This log line is result of extra LOG.debug(...) added to the code(notice diffrent line number 357 at the end)
2021-10-11 15:58:18.856 20 DEBUG nova.scheduler.filter_scheduler [req-c068a693-7f03-4a75-b5b0-54f5e34f8340 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: d95ba6be-7a19-4d70-9280-27a367f7b102] Selected host after consume_from_request: (vcmp1, vcmp1) ram: 6816MB disk: 37888MB io_ops: 1 instances: 1 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:357
Second instance is being scheduled
2021-10-11 15:58:19.487 22 DEBUG nova.scheduler.manager [req-7c2dff56-2a94-491e-baab-6080524aa592 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] Starting to schedule for instances: ['92e67f44-898b-4a07-a841-b2ffd296d089'] select_destinations /usr/lib/python3.6/site-packages/nova/scheduler/manager.py:133
Selected host for second instance with weights used for that selection.
It can be seen that weight for RAMweigher is 7328MB. Same as for first instance.
Should be 6816MB instead as when just after _consume_selected_host method was executed
2021-10-11 15:58:19.772 22 DEBUG nova.scheduler.filter_scheduler [req-7c2dff56-2a94-491e-baab-6080524aa592 2ee7a9b8a93c4cb0a12cd2cfab8ecd04 d3e8e3c73abd4b0fa1d4fc354ee0c3a7 - default default] [instance: 92e67f44-898b-4a07-a841-b2ffd296d089] Selected host: (vcmp1, vcmp1) ram: 7328MB disk: 38912MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python3.6/site-packages/nova/scheduler/filter_scheduler.py:354
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1946753/+subscriptions
Follow ups