← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1794985] [NEW] [QUEEN]: block migration is pretty unusable, blockjob bandwidth is limited to 1Mbps

 

Public bug reported:

Hi!

After upgrade pike->queen I stumbled upon a very low bandwidth of block migration (about of 1Mbps). I dug into the code a bit and it looks like an issue comes from shiny-new "dynamic migration speed" changes.
_live_migration function: /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py +7371

        if events:
            # We start migration with the minimum bandwidth
            # speed. Depending on the VIF type (see:
            # _get_neutron_events_for_live_migration) we will wait for
            # Neutron to send events that confirm network is setup or
            # directly configure QEMU to use the maximun BW allowed.
            bandwidth = MIN_MIGRATION_SPEED_BW
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        else:
            bandwidth = CONF.libvirt.live_migration_bandwidth
...
        else:
            if utils.is_neutron() and events:
                LOG.debug('VIF events received, continuing migration '
                          'with max bandwidth configured: %d',
                          CONF.libvirt.live_migration_bandwidth,
                          instance=instance)
                # Configure QEMU to use the maximum bandwidth allowed.
                guest.migrate_configure_max_speed(
                    CONF.libvirt.live_migration_bandwidth)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

First of all, AFAICS migrate_configure_max_speed only changes a bandwidth of migration operation, not a block mirroring operation and blockjob is stuck at 1Mbps forever.
I tried to fix this for my self in a "right" way, but snippet bellow failed to get lock and crushed. And it looks like a separate issue of libvirt. 

               for path in disk_paths:
                   LOG.warning('>>>> Try to rize bandwidth for %s to %d?',
                               path, CONF.libvirt.live_migration_bandwidth,
                               instance=instance)
                   dev = guest.get_block_device(path)
                   dev.blockjob_configure_max_speed(10000)

At last:
guest.migrate_configure_max_speed(CONF.libvirt.live_migration_bandwidth)
bring me a '0' bandwidth which is pretty ok if migration job initiated
with it but, at least for me, after being set to 1mbps at first time
later '0' make no changes at all. Just to be sure I switched of 'if
events:' branch and migration time of 3G-backed VM get back to
reasonable 18s (10gbps net). But with new logic even if I 'fix' block
mirror issue by hands and rise block speed with qemu-monitor:

# virsh qemu-monitor-command --pretty instance-0000006f '{ "execute":
"block-job-set-speed","arguments": { "device": "drive-virtio-disk0",
"speed": 10737418240 } }'

backing storage migrates in a seconds but latter memory migration
process my takes from 10 to 100 min. Though after being kicked by #
virsh migrate-setspeed instance-0000006f 100000000 migrates pretty
immediately.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1794985

Title:
  [QUEEN]: block migration is pretty unusable, blockjob bandwidth is
  limited to 1Mbps

Status in OpenStack Compute (nova):
  New

Bug description:
  Hi!

  After upgrade pike->queen I stumbled upon a very low bandwidth of block migration (about of 1Mbps). I dug into the code a bit and it looks like an issue comes from shiny-new "dynamic migration speed" changes.
  _live_migration function: /usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py +7371

          if events:
              # We start migration with the minimum bandwidth
              # speed. Depending on the VIF type (see:
              # _get_neutron_events_for_live_migration) we will wait for
              # Neutron to send events that confirm network is setup or
              # directly configure QEMU to use the maximun BW allowed.
              bandwidth = MIN_MIGRATION_SPEED_BW
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
          else:
              bandwidth = CONF.libvirt.live_migration_bandwidth
  ...
          else:
              if utils.is_neutron() and events:
                  LOG.debug('VIF events received, continuing migration '
                            'with max bandwidth configured: %d',
                            CONF.libvirt.live_migration_bandwidth,
                            instance=instance)
                  # Configure QEMU to use the maximum bandwidth allowed.
                  guest.migrate_configure_max_speed(
                      CONF.libvirt.live_migration_bandwidth)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  First of all, AFAICS migrate_configure_max_speed only changes a bandwidth of migration operation, not a block mirroring operation and blockjob is stuck at 1Mbps forever.
  I tried to fix this for my self in a "right" way, but snippet bellow failed to get lock and crushed. And it looks like a separate issue of libvirt. 

                 for path in disk_paths:
                     LOG.warning('>>>> Try to rize bandwidth for %s to %d?',
                                 path, CONF.libvirt.live_migration_bandwidth,
                                 instance=instance)
                     dev = guest.get_block_device(path)
                     dev.blockjob_configure_max_speed(10000)

  At last:
  guest.migrate_configure_max_speed(CONF.libvirt.live_migration_bandwidth)
  bring me a '0' bandwidth which is pretty ok if migration job initiated
  with it but, at least for me, after being set to 1mbps at first time
  later '0' make no changes at all. Just to be sure I switched of 'if
  events:' branch and migration time of 3G-backed VM get back to
  reasonable 18s (10gbps net). But with new logic even if I 'fix' block
  mirror issue by hands and rise block speed with qemu-monitor:

  # virsh qemu-monitor-command --pretty instance-0000006f '{ "execute":
  "block-job-set-speed","arguments": { "device": "drive-virtio-disk0",
  "speed": 10737418240 } }'

  backing storage migrates in a seconds but latter memory migration
  process my takes from 10 to 100 min. Though after being kicked by #
  virsh migrate-setspeed instance-0000006f 100000000 migrates pretty
  immediately.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1794985/+subscriptions


Follow ups