yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79500
[Bug 1838819] [NEW] Docs needed for tunables at large scale
Public bug reported:
Various things come up in IRC every once in a while about configuration
options that need to be tweaked at large scale (blizzard, cern, etc)
which once you hit hundreds or thousands of compute nodes need to be
changed to avoid killing the control plane.
One such option is this:
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval
>From a blizzard operator:
(3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane
(3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes
(3:06:26 PM) eandersson: We also ended up adding a variance
Similarly, CERN had to totally disable this one:
https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh
And rely on SIGHUP / restart of the service if they needed to refresh
that cache.
We should put these things in the admin docs as we come across them so
we don't forget about this stuff when new operators/users come along and
hit scaling issues.
** Affects: nova
Importance: Undecided
Status: New
** Tags: docs performance
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1838819
Title:
Docs needed for tunables at large scale
Status in OpenStack Compute (nova):
New
Bug description:
Various things come up in IRC every once in a while about
configuration options that need to be tweaked at large scale
(blizzard, cern, etc) which once you hit hundreds or thousands of
compute nodes need to be changed to avoid killing the control plane.
One such option is this:
https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.heal_instance_info_cache_interval
From a blizzard operator:
(3:04:18 PM) eandersson: mriedem, we had to set heal_instance_info_cache high because it was killing our control plane
(3:05:41 PM) eandersson: It was getting real heavy on large sites with 1k nodes
(3:06:26 PM) eandersson: We also ended up adding a variance
Similarly, CERN had to totally disable this one:
https://docs.openstack.org/nova/latest/configuration/config.html#compute.resource_provider_association_refresh
And rely on SIGHUP / restart of the service if they needed to refresh
that cache.
We should put these things in the admin docs as we come across them so
we don't forget about this stuff when new operators/users come along
and hit scaling issues.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1838819/+subscriptions