yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95158
[Bug 2093334] [NEW] libvirt - I/O erorrs after modify fs.aio-max-nr
Public bug reported:
Description
===========
After increased fs.aio-max-nr parameter to 1048576 value, we observe issue while instance is migrating or attaching a new volumes to the instances. Default value was 65536 (on my case I reachable limit so this is reason why I increase max parameter).
Symptoms are different, most of errors returning libvirt. This error is
tagged as qemuDomainBlockJobAbort:14400 on
/var/log/kolla/libvirt/libvirtd.log
Guest OS can return I/O errors too:
------
I/O error, dev vdb, sector 419430272 op (...)
Buffer I/O error on dev vdb, logical block 52428784, async page read
-----
Below logs while migrating instance:
-------------------
2025-01-03 11:56:36.696+0000: 2052655: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdb does not have an active block job
2025-01-03 11:56:36.750+0000: 2052655: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdb does not have an active block job
2025-01-03 11:58:37.800+0000: 2052656: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdc does not have an active block job
2025-01-03 11:58:37.853+0000: 2052656: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdc does not have an active block job
Jan 7 16:31:00 comp-b16 nova-compute: 2025-01-07 16:31:00.947 7
WARNING os_brick.initiator.connectors.base [req-a642bf63-afe1-4735-9484-6996f0c6a12a req-517426df-68df-462d-9b0b-09c1c3614010 2fa3eaeea47247778d2e5d9e622100bf 995bbd9fad3a4f71843859fef971ea2f - - default default]
Service needs to call os_brick.setup() before connecting volumes, if it doesn't it will break on the next release: nova.exception.VolumeRebaseFailed:
Volume rebase failed: invalid argument: disk vda does not have an active block job
------------
For purpose testing, I created a 10 VMs on compute with modified fs.aio-
max-nr. Guest OS (Ubuntu 22) on 4-5 VMs returned I/O errors with a new
attached disks (disk read only). To return my VMs to alive, I did detach
volumens, which were atached previously, and reboot instance.
I increased limit without restarting nova-libvirt container.
Steps to reproduce
==================
- reach max value for fs.aio-max-nr (you can try decrease to simulate)
- increase fs.aio-max-nr to 1048576 on compute B
- migrate instances from compute A to compute B / attach a new disk to exist instance on compute B
- check libvirt logs and OS guest
- reboot instance and verify if booted successfully
Expected result
===============
Increased value doesn't affect existing VMs and volumens
Actual result
=============
Issues with attach a new disks, migrating instances
Environment
===========
OpenStack 2023.2 Bobcat
Kolla-ansible
libvirtd (libvirt) 8.0.0
Maybe this parameter should be increased on different place on hypervisor/kolla? Dear team, could you explain me, what is wrong, please?
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2093334
Title:
libvirt - I/O erorrs after modify fs.aio-max-nr
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
After increased fs.aio-max-nr parameter to 1048576 value, we observe issue while instance is migrating or attaching a new volumes to the instances. Default value was 65536 (on my case I reachable limit so this is reason why I increase max parameter).
Symptoms are different, most of errors returning libvirt. This error
is tagged as qemuDomainBlockJobAbort:14400 on
/var/log/kolla/libvirt/libvirtd.log
Guest OS can return I/O errors too:
------
I/O error, dev vdb, sector 419430272 op (...)
Buffer I/O error on dev vdb, logical block 52428784, async page read
-----
Below logs while migrating instance:
-------------------
2025-01-03 11:56:36.696+0000: 2052655: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdb does not have an active block job
2025-01-03 11:56:36.750+0000: 2052655: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdb does not have an active block job
2025-01-03 11:58:37.800+0000: 2052656: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdc does not have an active block job
2025-01-03 11:58:37.853+0000: 2052656: error : qemuDomainBlockJobAbort:14400 : invalid argument: disk vdc does not have an active block job
Jan 7 16:31:00 comp-b16 nova-compute: 2025-01-07 16:31:00.947 7
WARNING os_brick.initiator.connectors.base [req-a642bf63-afe1-4735-9484-6996f0c6a12a req-517426df-68df-462d-9b0b-09c1c3614010 2fa3eaeea47247778d2e5d9e622100bf 995bbd9fad3a4f71843859fef971ea2f - - default default]
Service needs to call os_brick.setup() before connecting volumes, if it doesn't it will break on the next release: nova.exception.VolumeRebaseFailed:
Volume rebase failed: invalid argument: disk vda does not have an active block job
------------
For purpose testing, I created a 10 VMs on compute with modified
fs.aio-max-nr. Guest OS (Ubuntu 22) on 4-5 VMs returned I/O errors
with a new attached disks (disk read only). To return my VMs to alive,
I did detach volumens, which were atached previously, and reboot
instance.
I increased limit without restarting nova-libvirt container.
Steps to reproduce
==================
- reach max value for fs.aio-max-nr (you can try decrease to simulate)
- increase fs.aio-max-nr to 1048576 on compute B
- migrate instances from compute A to compute B / attach a new disk to exist instance on compute B
- check libvirt logs and OS guest
- reboot instance and verify if booted successfully
Expected result
===============
Increased value doesn't affect existing VMs and volumens
Actual result
=============
Issues with attach a new disks, migrating instances
Environment
===========
OpenStack 2023.2 Bobcat
Kolla-ansible
libvirtd (libvirt) 8.0.0
Maybe this parameter should be increased on different place on hypervisor/kolla? Dear team, could you explain me, what is wrong, please?
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2093334/+subscriptions