yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1214943] Re: Live migration should use the same memory over subscription logic as instance boot

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Doug Hellmann <doug@xxxxxxxxxxxxxxxx>
Date: Thu, 03 Dec 2015 21:32:54 -0000
Reply-to: Bug 1214943 <1214943@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: nova
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1214943

Title:
  Live migration should use the same memory over subscription logic as
  instance boot

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  I encounter an issue  when live migrate an instance specified  the
  target host,  i think the operation will be successes ,  but it is
  failed  for below reason:

  MigrationPreCheckError: Migration pre-check error: Unable to migrate
  a34f9b88-1e07-4798-af46-ca3b3dbaceda to hchenos2: Lack of
  memory(host:336 <= instance:512)

    1 .  My OpenStack cluster information :

  1). There are two compute nodes in my cluster,  and i created 4
  instance(1vcpu/512Mmemory) on these hosts

  -----------
  mysql> select hypervisor_hostname,vcpus,vcpus_used,running_vms,memory_mb,memory_mb_used,free_ram_mb,deleted from compute_nodes where deleted=0;
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  | hypervisor_hostname              | vcpus | vcpus_used | running_vms | memory_mb | memory_mb_used | free_ram_mb | deleted |
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  | hchenos1.eng.platformlab.ibm.com |     2 |          2 |           2 |      1872 |           1536 |         336 |       0 |
  | hchenos2.eng.platformlab.ibm.com |     2 |          2 |           2 |      1872 |           1536 |         336 |       0 |
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  2 rows in set (0.00 sec)

  mysql> 
  ------------------------
  [root@hchenos ~]# nova list
  +--------------------------------------+------+--------+----------+
  | ID                                   | Name | Status | Networks |
  +--------------------------------------+------+--------+----------+
  | a34f9b88-1e07-4798-af46-ca3b3dbaceda | vm1  | ACTIVE |          |      >>> on host 'hchenos1'
  | f6aaeff9-2220-4693-8e5a-710f4c52b774 | vm2  | ACTIVE |          |        >>>> on host 'hchenos2'
  | bbee57a2-81cd-4933-a943-1c2272f7f550 | vm4  | ACTIVE |          |      >>>> on host  'hchenos1'
  | 74fe26ec-919c-4fa7-890f-f59abe09ef4f | vm5  | ACTIVE |          |        >>>> on host 'hchenos2'
  +--------------------------------------+------+--------+----------+
  [root@hchenos ~]# 

   2）.  I  also enable the  ComputeFilter,RamFilter and CoreFilter in
  nova.conf,  but don't config over commit ratio for both vcpu and
  memory, so the default ratio will be used.

  2.   In the above conditions, live migrate instance vm1  to hchenos2
  failed:

  [root@hchenos ~]# nova live-migration vm1 hchenos2
  ERROR: Live migration of instance a34f9b88-1e07-4798-af46-ca3b3dbaceda to host hchenos2 failed (HTTP 400) (Request-ID: req-68244b99-e438-4000-8bdb-cc43b275c018)

   conductor log:
  ...
  ckages/nova/conductor/tasks/live_migrate.py", line 87, in _check_requested_destination\n    self._check_destination_has_enough_memory()\n\n  File "/usr/lib/python2.6/site-packages/nova/conductor/tasks/live_migrate.py", line 108, in _check_destination_has_enough_memory\n    mem_inst=mem_inst))\n\nMigrationPreCheckError: Migration pre-check error: Unable to migrate a34f9b88-1e07-4798-af46-ca3b3dbaceda to hchenos2: Lack of memory(host:336 <= instance:512)\n\n']

  I think the reason for above as below:

  the free_ram_mb  for 'hchenos2 ' is 336M,  the request memory is 512M,
  so the operation is failed.

  free_ram_mb = memory_mb (1872) - 512(reserved_host_memory_mb) -
  2*512(instance consume) = 336

  
  3.  But successfully  boot an instance  on 'hchenos2' 

  [root@hchenos ~]# nova boot --image cirros-0.3.0-x86_64 --flavor 1
  --availability-zone nova:hchenos2 xhu

  [root@hchenos ~]# nova list
  +--------------------------------------+------+--------+----------+
  | ID                                   | Name | Status | Networks |
  +--------------------------------------+------+--------+----------+
  | a34f9b88-1e07-4798-af46-ca3b3dbaceda | vm1  | ACTIVE |          |
  | f6aaeff9-2220-4693-8e5a-710f4c52b774 | vm2  | ACTIVE |          |
  | bbee57a2-81cd-4933-a943-1c2272f7f550 | vm4  | ACTIVE |          |
  | 74fe26ec-919c-4fa7-890f-f59abe09ef4f | vm5  | ACTIVE |          |
  | 364d1a01-67ed-4966-bbfd-d21b6bc3067c | xhu  | ACTIVE |          |   >>>>  is active
  +--------------------------------------+------+--------+----------+
  [root@hchenos ~]#

  mysql> select hypervisor_hostname,vcpus,vcpus_used,running_vms,memory_mb,memory_mb_used,free_ram_mb,deleted from compute_nodes where deleted=0;
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  | hypervisor_hostname              | vcpus | vcpus_used | running_vms | memory_mb | memory_mb_used | free_ram_mb | deleted |
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  | hchenos1.eng.platformlab.ibm.com |     2 |          2 |           2 |      1872 |           1536 |         336 |       0 |
  | hchenos2.eng.platformlab.ibm.com |     2 |          3 |           3 |      1872 |           2048 |        -176 |       0 |
  +----------------------------------+-------+------------+-------------+-----------+----------------+-------------+---------+
  2 rows in set (0.00 sec)

  mysql>

  So,  I'm very confused  for above test result, why boot an instance is
  OK on 'hchenos2', but live migration an instance to this host failed
  due to "not enough memory" ?

  After  carefully go through NOVA source code (live_migrate.py:
  execute()) , i think  below will cause this issue:

  1).  The function '_check_destination_has_enough_memory' doesn't
  consider the ram allocation ratio(default value is 1.5) when calculate
  host  free memory('free_ram_mb'), it is inconsistent with  'RamFilter'
  for memory check when boot instance.

  I think the free memory of host  'hchenos2' should be:

  free_ram_mb = memory_mb (1872)  *  ram_allocation_ratio(1.5)  -
  memory_mb_used('1536')  = 1272

  2)  why not check vcpu  for live migration target host, only check
  memory is enough?

  live_migrate.py: execute

          self._check_instance_is_running()
          self._check_host_is_up(self.source)

          if not self.destination:
              self.destination = self._find_destination()    
          else:
              self._check_requested_destination()    >>>> 

  
      def _check_requested_destination(self):
          self._check_destination_is_not_source()
          self._check_host_is_up(self.destination)
          self._check_destination_has_enough_memory()              >>>>   Only check memory, why not check vcpu   together?
          self._check_compatible_with_source_hypervisor(self.destination)
          self._call_livem_checks_on_host(self.destination)

  3)  The VM status need to be considering  as well,  for example, if the instance is off, it doesn't consume compute node resource anymore on KVM platform(is different form IBM PowerVM), but in resource_tracker.py:_update_usage_from_instances() , only instance 'deleted' flag
  is taken into account  when calculate resource usage.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1214943/+subscriptions