yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1484847] Re: image_cache_manager message storm

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1484847@xxxxxxxxxxxxxxxxxx>
Date: Tue, 29 Mar 2016 14:44:28 -0000
Reply-to: Bug 1484847 <1484847@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Reviewed:  https://review.openstack.org/298023
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=990eafe4c40e736744494624ca06d137ff6d49ea
Submitter: Jenkins
Branch:    master

commit 990eafe4c40e736744494624ca06d137ff6d49ea
Author: Hans Lindgren <hanlind@xxxxxx>
Date:   Sat Mar 26 19:58:49 2016 +0100

    Reduce number of db calls during image cache manager periodic task
    
    Make a single db call to get bdms for all instances instead of one
    call per instance.
    
    Change-Id: I74864b398f2d17a24b9ed676945183401e9872a0
    Closes-Bug: #1484847


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1484847

Title:
  image_cache_manager message storm

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  The image_cache_manager  periodic task running on behalf of the n-cpu.
  image_cache_manager queries all instances which uses the same file-system as him.
  (The message may contain all compute nodes in the region, if they are using the same shared pNFS)

  https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6333

  After all instance received it does looped  query  via rpc  (typically one line response selects).
  https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/virt/imagecache.py#L105

  At the end it will just needs to know which image is used.

  If we consider a default settings on 1024 compute node with shared
  filesystem where each hosts 16 vm we will have

  nr_vm_per_node = 16
  nr_vm = nr_cpu_node * nr_vm_per_node
  nr_cpu_node * nr_vm / interval_sec

  1024*16384/2400 = 6990.50 message/sec.
  It will take down the nova conductor queue.

  https://github.com/openstack/nova/blob/b91f3f60997dddb2f7c2fc007fe02b7dff1e0224/nova/compute/manager.py#L6329
  Mentions some future re-factoring, but that TODO note is ~3 years old.

  The looped BlockDeviceMappingList messages MUST be eliminated!

  One option is to remote the whole statistic calculation to the service
  which has direct DB connection and able to select multiple related
  BlockDeviceMapping.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1484847/+subscriptions

References

[Bug 1484847] [NEW] image_cache_manager messsage storm
From: Attila Fazekas, 2015-08-14