← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1780225] Re: Libvirt error when using --max > 1 with vGPU

 

In Stein, we merged the ability to have multiple Resource Providers, each of them being a pGPU.
In Ussuri, we accepted to have a specific vGPU type per pGPU.

Now, I tested the above behaviour with https://review.opendev.org/723858
and it works now, unless you ask for a specific total capacity.

I'll close this bug that was only for libvirt vGPUs and please look at
https://bugs.launchpad.net/nova/+bug/1874664 for the related issue.

** Changed in: nova
       Status: Confirmed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1780225

Title:
  Libvirt error when using --max > 1 with vGPU

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Description
  ===========

  Using devstack Rocky with a NVIDIA Tesla M10 + GRID driver on RHEL 7.5.
  Profile used in nova: nvidia-35 (num_heads=2, frl_config=45, framebuffer=512M, max_resolution=2560x1600, max_instance=16)

  I can launch instances one by one without any issue.
  I cannot use --max paramater greater than 1.

  Expected result
  ===============

  Be able to use --max parameter with vGPU

  Steps to reproduce
  ==================

  [root@host2 ~]# openstack server list
  +--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name      | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+-----------+--------+---------------------------------------------------------------------+--------+--------+

  [root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 2 instance
  +-------------------------------------+-----------------------------------------------+
  | Field                               | Value                                         |
  +-------------------------------------+-----------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                        |
  | OS-EXT-AZ:availability_zone         |                                               |
  | OS-EXT-SRV-ATTR:host                | None                                          |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | None                                          |
  | OS-EXT-SRV-ATTR:instance_name       |                                               |
  | OS-EXT-STS:power_state              | NOSTATE                                       |
  | OS-EXT-STS:task_state               | scheduling                                    |
  | OS-EXT-STS:vm_state                 | building                                      |
  | OS-SRV-USG:launched_at              | None                                          |
  | OS-SRV-USG:terminated_at            | None                                          |
  | accessIPv4                          |                                               |
  | accessIPv6                          |                                               |
  | addresses                           |                                               |
  | adminPass                           | iNiFmD6kNszw                                  |
  | config_drive                        |                                               |
  | created                             | 2018-07-05T09:19:25Z                          |
  | flavor                              | vgpu (vgpu1)                                  |
  | hostId                              |                                               |
  | id                                  | 5a8691a8-a18c-4c71-8541-be00f224fd82          |
  | image                               | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
  | key_name                            | myself                                        |
  | name                                | instance-1                                    |
  | progress                            | 0                                             |
  | project_id                          | fdea2c781db74ae593c5e9501e9290cc              |
  | properties                          |                                               |
  | security_groups                     | name='default'                                |
  | status                              | BUILD                                         |
  | updated                             | 2018-07-05T09:19:25Z                          |
  | user_id                             | 130a646fc362418f8b62ac11f1154942              |
  | volumes_attached                    |                                               |
  +-------------------------------------+-----------------------------------------------+

  [root@host2 ~]# openstack server list
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name       | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR  |                                                                     | rhel75 | vgpu   |
  | 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11              | rhel75 | vgpu   |
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0  | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

  [root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 1 instance
  +-------------------------------------+-----------------------------------------------+
  | Field                               | Value                                         |
  +-------------------------------------+-----------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                        |
  | OS-EXT-AZ:availability_zone         |                                               |
  | OS-EXT-SRV-ATTR:host                | None                                          |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | None                                          |
  | OS-EXT-SRV-ATTR:instance_name       |                                               |
  | OS-EXT-STS:power_state              | NOSTATE                                       |
  | OS-EXT-STS:task_state               | scheduling                                    |
  | OS-EXT-STS:vm_state                 | building                                      |
  | OS-SRV-USG:launched_at              | None                                          |
  | OS-SRV-USG:terminated_at            | None                                          |
  | accessIPv4                          |                                               |
  | accessIPv6                          |                                               |
  | addresses                           |                                               |
  | adminPass                           | MGxmntECb22S                                  |
  | config_drive                        |                                               |
  | created                             | 2018-07-05T09:19:45Z                          |
  | flavor                              | vgpu (vgpu1)                                  |
  | hostId                              |                                               |
  | id                                  | 24df940f-500b-44db-88e2-a6fd1fe915c0          |
  | image                               | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
  | key_name                            | myself                                        |
  | name                                | instance                                      |
  | progress                            | 0                                             |
  | project_id                          | fdea2c781db74ae593c5e9501e9290cc              |
  | properties                          |                                               |
  | security_groups                     | name='default'                                |
  | status                              | BUILD                                         |
  | updated                             | 2018-07-05T09:19:45Z                          |
  | user_id                             | 130a646fc362418f8b62ac11f1154942              |
  | volumes_attached                    |                                               |
  +-------------------------------------+-----------------------------------------------+

  [root@host2 ~]# openstack server list
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name       | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance   | BUILD  | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7              | rhel75 | vgpu   |
  | 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR  |                                                                     | rhel75 | vgpu   |
  | 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11              | rhel75 | vgpu   |
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0  | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

  [root@host2 ~]# openstack server list
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name       | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance   | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7              | rhel75 | vgpu   |
  | 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR  |                                                                     | rhel75 | vgpu   |
  | 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11              | rhel75 | vgpu   |
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0  | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

  [root@host2 ~]# openstack server create --flavor vgpu --image rhel75 --key-name myself --max 1 instance
  +-------------------------------------+-----------------------------------------------+
  | Field                               | Value                                         |
  +-------------------------------------+-----------------------------------------------+
  | OS-DCF:diskConfig                   | MANUAL                                        |
  | OS-EXT-AZ:availability_zone         |                                               |
  | OS-EXT-SRV-ATTR:host                | None                                          |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | None                                          |
  | OS-EXT-SRV-ATTR:instance_name       |                                               |
  | OS-EXT-STS:power_state              | NOSTATE                                       |
  | OS-EXT-STS:task_state               | scheduling                                    |
  | OS-EXT-STS:vm_state                 | building                                      |
  | OS-SRV-USG:launched_at              | None                                          |
  | OS-SRV-USG:terminated_at            | None                                          |
  | accessIPv4                          |                                               |
  | accessIPv6                          |                                               |
  | addresses                           |                                               |
  | adminPass                           | 69crZEFxBT9j                                  |
  | config_drive                        |                                               |
  | created                             | 2018-07-05T09:21:43Z                          |
  | flavor                              | vgpu (vgpu1)                                  |
  | hostId                              |                                               |
  | id                                  | 4a172549-91c2-46cc-8895-cd2fcbb19430          |
  | image                               | rhel75 (e63a49a8-4568-4b57-9d12-1eb1ede28438) |
  | key_name                            | myself                                        |
  | name                                | instance                                      |
  | progress                            | 0                                             |
  | project_id                          | fdea2c781db74ae593c5e9501e9290cc              |
  | properties                          |                                               |
  | security_groups                     | name='default'                                |
  | status                              | BUILD                                         |
  | updated                             | 2018-07-05T09:21:43Z                          |
  | user_id                             | 130a646fc362418f8b62ac11f1154942              |
  | volumes_attached                    |                                               |
  +-------------------------------------+-----------------------------------------------+

  [root@host2 ~]# openstack server list
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name       | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | 4a172549-91c2-46cc-8895-cd2fcbb19430 | instance   | BUILD  |                                                                     | rhel75 | vgpu   |
  | 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance   | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7              | rhel75 | vgpu   |
  | 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR  |                                                                     | rhel75 | vgpu   |
  | 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11              | rhel75 | vgpu   |
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0  | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

  [root@host2 ~]# openstack server list
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | ID                                   | Name       | Status | Networks                                                            | Image  | Flavor |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+
  | 4a172549-91c2-46cc-8895-cd2fcbb19430 | instance   | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe7d:a6d8, 10.0.0.4              | rhel75 | vgpu   |
  | 24df940f-500b-44db-88e2-a6fd1fe915c0 | instance   | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fefd:8796, 10.0.0.7              | rhel75 | vgpu   |
  | 515f0d21-6ab8-406e-9889-177718c79e61 | instance-2 | ERROR  |                                                                     | rhel75 | vgpu   |
  | 5a8691a8-a18c-4c71-8541-be00f224fd82 | instance-1 | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fe1f:d7a, 10.0.0.11              | rhel75 | vgpu   |
  | 56aeda96-f193-49fc-914d-8b507674eb16 | instance0  | ACTIVE | private=fda2:f16f:605e:0:f816:3eff:fef2:8e20, 10.0.0.12, 172.24.4.2 | rhel75 | vgpu   |
  +--------------------------------------+------------+--------+---------------------------------------------------------------------+--------+--------+

  - Nova error:
  {u'message': u'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance de2a5078-6acd-4ffd-9895-d664adb42296.', u'code': 500, u'details': u'  File "/opt/stack/nova/nova/conductor/manager.py", line 579, in build_instances\n    raise exception.MaxRetriesExceeded(reason=msg)\n', u'created': u'2018-07-05T07:32:52Z'} |

  - Libvirt error:
  messages:Jul  5 03:32:51 host2 nova-compute: #033[00m: libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/25f56195-9719-4380-a90b-084d64307e06 is in use by driver QEMU, domain instance-00000019
  messages:Jul  5 03:32:51 host2 nova-compute: #033[01;31mERROR nova.virt.libvirt.driver [#033[01;36mNone req-e04582ed-de22-4bfa-9253-92e687328a4c #033[00;36mservice nova#033[01;31m] #033[01;35m[instance: de2a5078-6acd-4ffd-9895-d664adb42296] #033[01;31mFailed to start libvirt guest#033[00m: libvirtError: Requested operation is not valid: mediated device /sys/bus/mdev/devices/25f56195-9719-4380-a90b-084d64307e06 is in use by driver QEMU, domain instance-00000019

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1780225/+subscriptions


References