yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #30549
[Bug 1437199] [NEW] zookeeper driver used with O(n^2) complexity by the scheduler
Public bug reported:
(Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
(Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177
Iterating the hosts through the ComputeFilter also has this issue,
ComputeFilter usage in a loop has other performance issues .
The zk driver issue can be mitigated by doing the testing `filtering` in
the is_up instead of the get_all , by reorganizing the code.
However better solution would be to have the scheduler to use the get_all,
or redesigning the servicegroup management.
A better design would be to use the DB even with the zk,mc drvier, but
do update ONLY when the service actually came up or dies, in this case
the sg drivers MAY need dedicated service processes.
NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec update freq),
if your servers are bad and every server has 1:1000 chance to die on the given day, it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.
NOTE: If the up/down is knowable just form the DB, the scheduler could
eliminate the dead hosts at the first DB query, without using
ComputeFilter as it is used now. (The plugins SHOULD be able to extend
the base hosts query)
** Affects: nova
Importance: Undecided
Status: New
** Summary changed:
- zookeper driver used with O(n^2) complexity by the scheduler
+ zookeeper driver used with O(n^2) complexity by the scheduler
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1437199
Title:
zookeeper driver used with O(n^2) complexity by the scheduler
Status in OpenStack Compute (Nova):
New
Bug description:
(Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
(Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177
Iterating the hosts through the ComputeFilter also has this issue,
ComputeFilter usage in a loop has other performance issues .
The zk driver issue can be mitigated by doing the testing `filtering`
in the is_up instead of the get_all , by reorganizing the code.
However better solution would be to have the scheduler to use the get_all,
or redesigning the servicegroup management.
A better design would be to use the DB even with the zk,mc drvier, but
do update ONLY when the service actually came up or dies, in this case
the sg drivers MAY need dedicated service processes.
NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec update freq),
if your servers are bad and every server has 1:1000 chance to die on the given day, it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.
NOTE: If the up/down is knowable just form the DB, the scheduler could
eliminate the dead hosts at the first DB query, without using
ComputeFilter as it is used now. (The plugins SHOULD be able to extend
the base hosts query)
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1437199/+subscriptions
Follow ups
References