← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1742747] [NEW] RT overrides default allocation_ratios for ram cpu and disk

 

Public bug reported:

Description
===========

Resource tracker overrides default allocation ratio values with values
from configuration files without checking it those values are "valid
ones".

Allocation ratios values are taken directly from configuration files. This is a good approach unless allocation ratios in configuration file are set to 0.0. Here comes a problem. Default configuration parameter sets those ratios to be 0.0:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
So if allocation ratio is set as 0.0 (or not set, because 0.0 is default value), we would have issues with send this ratio with RT update to placement. 
*BUT here comes the solution*:
https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L198

When we read ComputeNode object from DB we also check if ratios are 0.0,
if yes we override them (CPU-16x, RAM-1.5x, DISK-1x).

But just after initialization of ComputeNode object here:
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L539
We copy actual resources to it (thanks to _copy_resources).

We override allocations from ComputeNode to those that are taken from
configuration file - yes, thats ok. If operator wants to change ratios -
he will do it in conf file and then restart the service.

But what if he would leave those parameters untouched in config? Here comes the problem!
Those params would be always set to 0.0 - placement api doesn't like it and raise:
InvalidInventoryCapacity: Invalid inventory for 'VCPU' on resource provider '52559824-5fb1-424b-a4cf-79da9199447d'. The reserved value is greater than or equal to total.
The exception is raised here:
https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L228


Some code around problem:
Code:
> /opt/stack/nova/nova/compute/resource_tracker.py(610)
 602         def _copy_resources(self, compute_node, resources):                                                                                                                                                                             
 603             """Copy resource values to supplied compute_node."""                                                                                                                                                                        
 604             # purge old stats and init with anything passed in by the driver                                                                                                                                                            
 605             self.stats.clear()                                                                                                                                                                                                          
 606             self.stats.digest_stats(resources.get('stats'))                                                                                                                                                                             
 607             compute_node.stats = copy.deepcopy(self.stats)                                                                                                                                                                              
 608                                                                                                                                                                                                                                         
 609             # update the allocation ratios for the related ComputeNode object                                                                                                                                                           
 610  ->         compute_node.ram_allocation_ratio = self.ram_allocation_ratio                                                                                                                                                               
 611             compute_node.cpu_allocation_ratio = self.cpu_allocation_ratio                                                                                                                                                               
 612             compute_node.disk_allocation_ratio = self.disk_allocation_ratio                                                                                                                                                             
 613                                                                                                                                                                                                                                         
 614             # now copy rest to compute_node                                                                                                                                                                                             
 615             compute_node.update_from_virt_driver(resources)                                                                                                                                                                             
(Pdb++) self.cpu_allocation_ratio
0.0

self.cpu_allocation_ratio comes directly from config:
https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L148


Environment
===========
Latest master


How to reproduce
===========
1. Spawn devstack
2. Leave configuration files untouched
3. Observe overrides in 
https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L611
4. Watch how RT sends it to placement and placement responds with 400 - bad request.

** Affects: nova
     Importance: Undecided
     Assignee: Maciej Jozefczyk (maciej.jozefczyk)
         Status: New

** Changed in: nova
     Assignee: (unassigned) => Maciej Jozefczyk (maciej.jozefczyk)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1742747

Title:
  RT overrides default allocation_ratios for ram cpu and disk

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  Resource tracker overrides default allocation ratio values with values
  from configuration files without checking it those values are "valid
  ones".

  Allocation ratios values are taken directly from configuration files. This is a good approach unless allocation ratios in configuration file are set to 0.0. Here comes a problem. Default configuration parameter sets those ratios to be 0.0:
  https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
  So if allocation ratio is set as 0.0 (or not set, because 0.0 is default value), we would have issues with send this ratio with RT update to placement. 
  *BUT here comes the solution*:
  https://github.com/openstack/nova/blob/master/nova/objects/compute_node.py#L198

  When we read ComputeNode object from DB we also check if ratios are
  0.0, if yes we override them (CPU-16x, RAM-1.5x, DISK-1x).

  But just after initialization of ComputeNode object here:
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L539
  We copy actual resources to it (thanks to _copy_resources).

  We override allocations from ComputeNode to those that are taken from
  configuration file - yes, thats ok. If operator wants to change ratios
  - he will do it in conf file and then restart the service.

  But what if he would leave those parameters untouched in config? Here comes the problem!
  Those params would be always set to 0.0 - placement api doesn't like it and raise:
  InvalidInventoryCapacity: Invalid inventory for 'VCPU' on resource provider '52559824-5fb1-424b-a4cf-79da9199447d'. The reserved value is greater than or equal to total.
  The exception is raised here:
  https://github.com/openstack/nova/blob/master/nova/objects/resource_provider.py#L228


  
  Some code around problem:
  Code:
  > /opt/stack/nova/nova/compute/resource_tracker.py(610)
   602         def _copy_resources(self, compute_node, resources):                                                                                                                                                                             
   603             """Copy resource values to supplied compute_node."""                                                                                                                                                                        
   604             # purge old stats and init with anything passed in by the driver                                                                                                                                                            
   605             self.stats.clear()                                                                                                                                                                                                          
   606             self.stats.digest_stats(resources.get('stats'))                                                                                                                                                                             
   607             compute_node.stats = copy.deepcopy(self.stats)                                                                                                                                                                              
   608                                                                                                                                                                                                                                         
   609             # update the allocation ratios for the related ComputeNode object                                                                                                                                                           
   610  ->         compute_node.ram_allocation_ratio = self.ram_allocation_ratio                                                                                                                                                               
   611             compute_node.cpu_allocation_ratio = self.cpu_allocation_ratio                                                                                                                                                               
   612             compute_node.disk_allocation_ratio = self.disk_allocation_ratio                                                                                                                                                             
   613                                                                                                                                                                                                                                         
   614             # now copy rest to compute_node                                                                                                                                                                                             
   615             compute_node.update_from_virt_driver(resources)                                                                                                                                                                             
  (Pdb++) self.cpu_allocation_ratio
  0.0

  self.cpu_allocation_ratio comes directly from config:
  https://github.com/openstack/nova/blob/master/nova/conf/compute.py#L397
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py#L148


  Environment
  ===========
  Latest master

  
  How to reproduce
  ===========
  1. Spawn devstack
  2. Leave configuration files untouched
  3. Observe overrides in 
  https://github.com/openstack/nova/blob/master/nova/compute/resource_tracker.py?utf8=✓#L611
  4. Watch how RT sends it to placement and placement responds with 400 - bad request.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1742747/+subscriptions