← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2063451] [NEW] Using the number of CPUs as a default for the workers leads to problems with big setups

 

Public bug reported:

Nova uses the CPU count as a default for the following worker settings,
which is problematic for people deploying on machines with a large
number of CPUs:

    [DEFAULT]
    osapi_compute_workers=
    metadata_workers=

    [conductor]
    workers=

    [scheduler]
    workers=

In our case, it is a setup with >100 CPUs where the huge number of
workers lead to a lot of traffic to the cell1 database (MariaDB Galera)
for an otherwise empty OpenStack cluster, which in turn quickly filled
the database filesystem because of the growing MariaDB binlog. These
problems disappeared as soon as we explicitely configured the workers
for nova-scheduler and nova-conductor with a count of 8, each (we also
lowered the other workers for the sake of consistency).

I suggest that nova should apply a limit for the default. I couldn't
find guidelines for the worker counts in the nova docs – however,
according to other OpenStack projects, there seems to be some kind of
concensus of using a worker count way below 20:

* Kolla Ansible sets a maximum of 5 workers [1]
* puppet-openstacklib sets a maximum of 12 workers [2]


[1] https://github.com/openstack/kolla-ansible/blob/5a663aec1dc6ede45a860eecab84af05cd06b67f/ansible/group_vars/all.yml#L742
[2] https://github.com/openstack/puppet-openstacklib/blob/master/lib/facter/os_workers.rb#L45

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2063451

Title:
  Using the number of CPUs as a default for the workers leads to
  problems with big setups

Status in OpenStack Compute (nova):
  New

Bug description:
  Nova uses the CPU count as a default for the following worker
  settings, which is problematic for people deploying on machines with a
  large number of CPUs:

      [DEFAULT]
      osapi_compute_workers=
      metadata_workers=

      [conductor]
      workers=

      [scheduler]
      workers=

  In our case, it is a setup with >100 CPUs where the huge number of
  workers lead to a lot of traffic to the cell1 database (MariaDB
  Galera) for an otherwise empty OpenStack cluster, which in turn
  quickly filled the database filesystem because of the growing MariaDB
  binlog. These problems disappeared as soon as we explicitely
  configured the workers for nova-scheduler and nova-conductor with a
  count of 8, each (we also lowered the other workers for the sake of
  consistency).

  I suggest that nova should apply a limit for the default. I couldn't
  find guidelines for the worker counts in the nova docs – however,
  according to other OpenStack projects, there seems to be some kind of
  concensus of using a worker count way below 20:

  * Kolla Ansible sets a maximum of 5 workers [1]
  * puppet-openstacklib sets a maximum of 12 workers [2]

  
  [1] https://github.com/openstack/kolla-ansible/blob/5a663aec1dc6ede45a860eecab84af05cd06b67f/ansible/group_vars/all.yml#L742
  [2] https://github.com/openstack/puppet-openstacklib/blob/master/lib/facter/os_workers.rb#L45

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2063451/+subscriptions