yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1960345] Re: Nova documentation isn't clear enough about live_migration_downtime behavior

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1960345@xxxxxxxxxxxxxxxxxx>
Date: Wed, 23 Feb 2022 20:12:11 -0000
Reply-to: Bug 1960345 <1960345@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx

Reviewed:  https://review.opendev.org/c/openstack/nova/+/828387
Committed: https://opendev.org/openstack/nova/commit/de110b042d8e340d19a52b9fb7ef6f4c52bc0762
Submitter: "Zuul (22348)"
Branch:    master

commit de110b042d8e340d19a52b9fb7ef6f4c52bc0762
Author: Pedro Almeida <pedro.monteiroazevedodemouraalmeida@xxxxxxxxxxxxx>
Date:   Tue Feb 8 14:51:46 2022 -0300

    Update live_migration_downtime definition
    
    Before, the definition of live_migration_downtime didn't explain
    if any exception/timeout occurs if the migration exceeds the value.
    This is just used as a reference for nova and if any problem happens
    when the VM gets paused, there will be no abort or force-complete.
    
    Closes-Bug: #1960345
    Signed-off-by: Pedro Almeida <pedro.monteiroazevedodemouraalmeida@xxxxxxxxxxxxx>
    Change-Id: I336481d1801a367b5628fedcd2aa5f5cf763355a


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960345

Title:
  Nova documentation isn't clear enough about live_migration_downtime
  behavior

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  https://docs.openstack.org/nova/xena/admin/configuring-migrations.html
  says that

  "live_migration_downtime sets the maximum permitted downtime for a
  live migration, in milliseconds. The default is 500."

  but it's not clear enough about what happens (or *if* something
  happens) if that "maximum permitted downtime" gets exceeded. It seems
  there's no timeout action regarding the downtime and IMO it's
  misleading the user to think so.

  Downtime increased to max:

  nova-compute-controller-0-937646f6-9q4n9 nova-compute 2022-02-02
  16:59:44.477 1552666 INFO nova.virt.libvirt.migration [-] [instance:
  5d91f6cc-dcc4-4f1f-8285-0b682284ac35] Increasing downtime to 100 ms
  after 72 sec elapsed time

  Downtime being exceeded:

  (controller-0) 2022-02-02 17:00:06.503+0000: 3737473: debug :
  qemuProcessHandleStop:674 : Transitioned guest instance-0000000b to
  paused state, reason migration

  (controller-1) 2022-02-02 17:00:06.613+0000: 4075521: debug :
  qemuProcessHandleResume:726 : Transitioned guest instance-0000000b out
  of paused into resumed state

  Also on libvirt logs:

  2022-02-02 17:00:06.579+0000: 3737473: info : qemuMonitorJSONIOProcessLine:217 : QEMU_MONITOR_RECV_REPLY: mon=0x7f92e00f4ae0 reply={"return": {"status": "completed", "setup-time": 190, "downtime": 179, "total-time": 91054, "ram": {"total": 8594989056, "postcopy-requests": 0, "dirty-sync-count": 128, "multifd-bytes": 0, "page-size": 4096, "remaining": 0, "mbps": 7196.567648, "transferred": 81738286801, "duplicate": 870668, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 81571127296, "normal": 19914826}}, "id": "libvirt-201"}
      <downtime>75</downtime>
      <downtime>109</downtime>
      <downtime>109</downtime>
      <downtime>109</downtime>

  And there was no timeout exception.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1960345/+subscriptions

References

[Bug 1960345] [NEW] Nova documentation isn't clear enough about live_migration_downtime behavior
From: Pedro Monteiro Azevedo de Moura Almeida, 2022-02-08