yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #36872
[Bug 1484847] [NEW] image_cache_manager messsage storm
Public bug reported:
The image_cache_manager periodic task running on behalf of the n-cpu.
image_cache_manager queries all instances which uses the same file-system as him.
(The message may contain all compute nodes in the region, if they are using the same shared pNFS)
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6333
After all instance received it does looped query via rpc (typically one response selects).
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/virt/imagecache.py#L105
At the end it will just needs to know which image is used.
If we consider a default settings on 1024 compute node with shared
filesystem where each hosts 16 vm we will have
nr_cpu * nr_vm / interval_sec
1024*16384/2400 = 6990.50 message/sec.
It will take down the nova conductor queue.
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6329
Mentions some future re-factoring, but that TODO note is ~3 years old.
The looped BlockDeviceMappingList messages MUST be eliminated!
One option is to remote the whole statistic calculation to the service which has direct DB connection,
and able to select multiple related BlockDeviceMapping.
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1484847
Title:
image_cache_manager messsage storm
Status in OpenStack Compute (nova):
New
Bug description:
The image_cache_manager periodic task running on behalf of the n-cpu.
image_cache_manager queries all instances which uses the same file-system as him.
(The message may contain all compute nodes in the region, if they are using the same shared pNFS)
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6333
After all instance received it does looped query via rpc (typically one response selects).
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/virt/imagecache.py#L105
At the end it will just needs to know which image is used.
If we consider a default settings on 1024 compute node with shared
filesystem where each hosts 16 vm we will have
nr_cpu * nr_vm / interval_sec
1024*16384/2400 = 6990.50 message/sec.
It will take down the nova conductor queue.
https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6329
Mentions some future re-factoring, but that TODO note is ~3 years old.
The looped BlockDeviceMappingList messages MUST be eliminated!
One option is to remote the whole statistic calculation to the service which has direct DB connection,
and able to select multiple related BlockDeviceMapping.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1484847/+subscriptions
Follow ups