yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88351
[Bug 1960345] Re: Nova documentation isn't clear enough about live_migration_downtime behavior
Reviewed: https://review.opendev.org/c/openstack/nova/+/828387
Committed: https://opendev.org/openstack/nova/commit/de110b042d8e340d19a52b9fb7ef6f4c52bc0762
Submitter: "Zuul (22348)"
Branch: master
commit de110b042d8e340d19a52b9fb7ef6f4c52bc0762
Author: Pedro Almeida <pedro.monteiroazevedodemouraalmeida@xxxxxxxxxxxxx>
Date: Tue Feb 8 14:51:46 2022 -0300
Update live_migration_downtime definition
Before, the definition of live_migration_downtime didn't explain
if any exception/timeout occurs if the migration exceeds the value.
This is just used as a reference for nova and if any problem happens
when the VM gets paused, there will be no abort or force-complete.
Closes-Bug: #1960345
Signed-off-by: Pedro Almeida <pedro.monteiroazevedodemouraalmeida@xxxxxxxxxxxxx>
Change-Id: I336481d1801a367b5628fedcd2aa5f5cf763355a
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1960345
Title:
Nova documentation isn't clear enough about live_migration_downtime
behavior
Status in OpenStack Compute (nova):
Fix Released
Bug description:
https://docs.openstack.org/nova/xena/admin/configuring-migrations.html
says that
"live_migration_downtime sets the maximum permitted downtime for a
live migration, in milliseconds. The default is 500."
but it's not clear enough about what happens (or *if* something
happens) if that "maximum permitted downtime" gets exceeded. It seems
there's no timeout action regarding the downtime and IMO it's
misleading the user to think so.
Downtime increased to max:
nova-compute-controller-0-937646f6-9q4n9 nova-compute 2022-02-02
16:59:44.477 1552666 INFO nova.virt.libvirt.migration [-] [instance:
5d91f6cc-dcc4-4f1f-8285-0b682284ac35] Increasing downtime to 100 ms
after 72 sec elapsed time
Downtime being exceeded:
(controller-0) 2022-02-02 17:00:06.503+0000: 3737473: debug :
qemuProcessHandleStop:674 : Transitioned guest instance-0000000b to
paused state, reason migration
(controller-1) 2022-02-02 17:00:06.613+0000: 4075521: debug :
qemuProcessHandleResume:726 : Transitioned guest instance-0000000b out
of paused into resumed state
Also on libvirt logs:
2022-02-02 17:00:06.579+0000: 3737473: info : qemuMonitorJSONIOProcessLine:217 : QEMU_MONITOR_RECV_REPLY: mon=0x7f92e00f4ae0 reply={"return": {"status": "completed", "setup-time": 190, "downtime": 179, "total-time": 91054, "ram": {"total": 8594989056, "postcopy-requests": 0, "dirty-sync-count": 128, "multifd-bytes": 0, "page-size": 4096, "remaining": 0, "mbps": 7196.567648, "transferred": 81738286801, "duplicate": 870668, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 81571127296, "normal": 19914826}}, "id": "libvirt-201"}
<downtime>75</downtime>
<downtime>109</downtime>
<downtime>109</downtime>
<downtime>109</downtime>
And there was no timeout exception.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1960345/+subscriptions
References