← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1437199] [NEW] zookeeper driver used with O(n^2) complexity by the scheduler

 

Public bug reported:

(Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
(Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177

Iterating the hosts through  the  ComputeFilter also has this issue,  
ComputeFilter usage in a loop has other performance issues .

The zk driver issue can be mitigated by doing the testing `filtering` in
the is_up instead of the get_all , by reorganizing the code.


However better solution would be to have the scheduler to use the get_all,
or redesigning the servicegroup management.

A better design would be to use the DB even with the zk,mc drvier, but
do update ONLY when the service actually came up or dies, in this case
the sg drivers MAY need dedicated service processes.

NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec  update freq),
if your servers are bad and every server has 1:1000 chance to die on the given day,  it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.

NOTE: If the up/down is knowable just form the DB, the scheduler could
eliminate the dead hosts at the first DB query, without using
ComputeFilter as it is used now. (The plugins SHOULD be able to extend
the  base hosts query)

** Affects: nova
     Importance: Undecided
         Status: New

** Summary changed:

- zookeper driver used with O(n^2) complexity  by the scheduler
+ zookeeper driver used with O(n^2) complexity  by the scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1437199

Title:
  zookeeper driver used with O(n^2) complexity  by the scheduler

Status in OpenStack Compute (Nova):
  New

Bug description:
  (Loop1) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/scheduler/driver.py#L55
  (Loop2) https://github.com/openstack/nova/blob/af2d6c9576b1ac5f3b3768870bb15d9b5cf1610b/nova/servicegroup/drivers/zk.py#L177

  Iterating the hosts through  the  ComputeFilter also has this issue,  
  ComputeFilter usage in a loop has other performance issues .

  The zk driver issue can be mitigated by doing the testing `filtering`
  in the is_up instead of the get_all , by reorganizing the code.

  
  However better solution would be to have the scheduler to use the get_all,
  or redesigning the servicegroup management.

  A better design would be to use the DB even with the zk,mc drvier, but
  do update ONLY when the service actually came up or dies, in this case
  the sg drivers MAY need dedicated service processes.

  NOTE: The servicegroup driver concept was introduced to avoid doing 10_000 DB update/sec @100_000 host (10/sec  update freq),
  if your servers are bad and every server has 1:1000 chance to die on the given day,  it would lead only to 0.001 UPDATE/sec (100/day) @100_000 host.

  NOTE: If the up/down is knowable just form the DB, the scheduler could
  eliminate the dead hosts at the first DB query, without using
  ComputeFilter as it is used now. (The plugins SHOULD be able to extend
  the  base hosts query)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1437199/+subscriptions


Follow ups

References