yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #84937
[Bug 1895316] Re: VM with single GPU flavor gets two GPUs assigned
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]
** Changed in: nova
Status: Incomplete => Expired
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1895316
Title:
VM with single GPU flavor gets two GPUs assigned
Status in OpenStack Compute (nova):
Expired
Bug description:
Description
===========
Some VMs with the requested single PCI device (GPU) got provisioned with actual two GPUs attached.
Steps to reproduce
==================
Deploy a GPU-assigned VM with the following Heat template:
$ openstack stack template show vmaas-p1220-kvm1
description: Template to create a single VM for a self service project in CCCC OpenStack
heat_template_version: rocky
outputs:
instance_ip:
description: The IP address of the deployed instance
value:
get_attr:
- server_1
- first_address
instance_name:
description: Name of the instance
value:
get_attr:
- server_1
- name
parameters:
flavor:
default: vmaas.p9.2xxlarge.v100-32.1
description: Type of instance (flavor) to be used
label: Flavor
type: string
image:
default: rhel7.6alt-ppc64le
description: Image to be used for compute instance
label: Image name or ID
type: string
instance_boot_disk_name:
default: p1220-kvm1-boot
description: Name of instance boot volume
label: Instance disk name
type: string
instance_boot_disk_size:
default: '200'
description: Size of instance boot volume
label: Instance disk size
type: string
instance_ip:
default: AAA.BB.CC.DD
description: IP address of compute instance
label: Instance IP address
type: string
instance_name:
default: p1220-kvm1
description: Name of compute instance
label: Instance name
type: string
key:
default: ''
description: Name of existing ssh key-pair to be used for compute instance
label: Key name
type: string
project_vlan:
default: '1220'
description: Project VLAN to attach instance to
label: Network name or ID
type: string
resources:
cloud_config_part1:
properties:
cloud_config:
write_files:
- content: ==cloud_config_data_here===
encoding: b64
owner: root:root
path: /cloud-config.sh
permissions: '0700'
type: OS::Heat::CloudConfig
cloud_config_part2:
==cloud_config_data_here===
type: OS::Heat::CloudConfig
cloud_config_run:
properties:
parts:
- config:
get_resource: cloud_config_part1
- config:
get_resource: cloud_config_part2
type: OS::Heat::MultipartMime
server_1:
depends_on:
- cloud_config_run
- volume_1
properties:
block_device_mapping_v2:
- boot_index: 0
delete_on_termination: true
volume_id:
get_resource: volume_1
config_drive: true
flavor:
get_param: flavor
key_name:
get_param: key
metadata:
Flavor:
get_param: flavor
Image:
get_param: image
Project: XXXXXXXX
ProjectDescription: ''
Reservation: YYYYYYYY
Submitter: Portal/ZZZZZZZ
name:
get_param: instance_name
networks:
- fixed_ip:
get_param: instance_ip
network:
list_join:
- ''
- - v
- get_param: project_vlan
user_data:
get_resource: cloud_config_run
user_data_format: RAW
type: OS::Nova::Server
volume_1:
properties:
image:
get_param: image
metadata:
Project: XXXXXXXX
Reservation: YYYYYYYY
name:
get_param: instance_boot_disk_name
size:
get_param: instance_boot_disk_size
type: OS::Cinder::Volume
And flavor:
$ openstack flavor show 02542b5c-3bab-43df-9dcd-59f2867f344b
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OS-FLV-DISABLED:disabled | False |
| OS-FLV-EXT-DATA:ephemeral | 0 |
| access_project_ids | 4fcaf92a3fa148b4bc15d1a170bb67a0 |
| disk | 0 |
| id | 02542b5c-3bab-43df-9dcd-59f2867f344b |
| name | vmaas.p9.2xxlarge.v100-32.1 |
| os-flavor-access:is_public | False |
| properties | aggregate_instance_extra_specs:cpu='p9', aggregate_instance_extra_specs:env='vmaas', aggregate_instance_extra_specs:gpu='v100-32', hw:cpu_cores='8', hw:cpu_sockets='1', hw:cpu_threads='4', pci_passthrough:alias='v100-32:1' |
| ram | 65536 |
| rxtx_factor | 1.0 |
| swap | |
| vcpus | 32 |
+----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Expected result
===============
Get a VM assigned with the single PCI device (GPU)
Actual result
=============
Got a VM with two GPUs attached.
p1220-kvm1 ~]$ lspci
0000:00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
0000:00:02.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
0000:00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
0000:00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
0000:00:06.0 VGA compatible controller: Device 1234:1111 (rev 02)
0001:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
0002:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
$ nvidia-smi
Fri Sep 11 11:28:21 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000001:00:01.0 Off | 0 |
| N/A 26C P0 38W / 300W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... On | 00000002:00:01.0 Off | 0 |
| N/A 28C P0 39W / 300W | 0MiB / 32510MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
# virsh dumpxml instance-00003268
<domain type='kvm' id='24'>
<name>instance-00003268</name>
<uuid>a49ba344-8b50-4014-baf3-24f8252f212e</uuid>
<metadata>
<nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0">
<nova:package version="18.2.1"/>
<nova:name>p1220-kvm1</nova:name>
<nova:creationTime>2020-09-11 08:28:22</nova:creationTime>
<nova:flavor name="vmaas.p9.2xxlarge.v100-32.1">
<nova:memory>65536</nova:memory>
<nova:disk>0</nova:disk>
<nova:swap>0</nova:swap>
<nova:ephemeral>0</nova:ephemeral>
<nova:vcpus>32</nova:vcpus>
</nova:flavor>
<nova:owner>
<nova:user uuid="cc738b41714c47d5917a0a4a152ad133">vmaas</nova:user>
<nova:project uuid="4fcaf92a3fa148b4bc15d1a170bb67a0">vmaas</nova:project>
</nova:owner>
</nova:instance>
</metadata>
<memory unit='KiB'>67108864</memory>
<currentMemory unit='KiB'>67108864</currentMemory>
<vcpu placement='static'>32</vcpu>
<cputune>
<shares>32768</shares>
</cputune>
<resource>
<partition>/machine</partition>
</resource>
<os>
<type arch='ppc64le' machine='pseries-bionic'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' cores='8' threads='4'/>
</cpu>
<clock offset='utc'>
<timer name='pit' tickpolicy='delay'/>
<timer name='rtc' tickpolicy='catchup'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/kvm</emulator>
<disk type='network' device='cdrom'>
<driver name='qemu' type='raw' cache='none' discard='unmap'/>
<auth username='nova-compute'>
<secret type='ceph' uuid='514c9fca-8cbe-11e2-9c52-3bc8c7819472'/>
</auth>
<source protocol='rbd' name='nova/a49ba344-8b50-4014-baf3-24f8252f212e_disk.config'>
<host name='10.0.0.11' port='6789'/>
<host name='10.0.0.12' port='6789'/>
<host name='10.0.0.13' port='6789'/>
</source>
<target dev='sda' bus='scsi'/>
<readonly/>
<alias name='scsi0-0-0-1'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
<disk type='network' device='disk'>
<driver name='qemu' type='raw' cache='none' discard='unmap'/>
<auth username='cinder-ceph'>
<secret type='ceph' uuid='046a66b2-bf3d-4be9-a8c1-1334c3fbc3d7'/>
</auth>
<source protocol='rbd' name='cinder-ceph/volume-b7584535-d482-4bd8-bd0f-c08435708f23'>
<host name='10.0.0.11' port='6789'/>
<host name='10.0.0.12' port='6789'/>
<host name='10.0.0.13' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
<serial>b7584535-d482-4bd8-bd0f-c08435708f23</serial>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</disk>
<controller type='scsi' index='0' model='virtio-scsi'>
<alias name='scsi0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</controller>
<controller type='usb' index='0' model='qemu-xhci'>
<alias name='usb'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='0' model='pci-root'>
<model name='spapr-pci-host-bridge'/>
<target index='0'/>
<alias name='pci.0'/>
</controller>
<controller type='pci' index='1' model='pci-root'>
<model name='spapr-pci-host-bridge'/>
<target index='1'/>
<alias name='pci.1'/>
</controller>
<controller type='pci' index='2' model='pci-root'>
<model name='spapr-pci-host-bridge'/>
<target index='2'/>
<alias name='pci.2'/>
</controller>
<interface type='bridge'>
<mac address='fa:16:3e:36:5a:d5'/>
<source bridge='br-int'/>
<virtualport type='openvswitch'>
<parameters interfaceid='bb1b7bfb-bbe9-4560-8dd0-ae1688cdb16c'/>
</virtualport>
<target dev='tapbb1b7bfb-bb'/>
<model type='virtio'/>
<mtu size='9000'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/0'/>
<log file='/var/lib/nova/instances/a49ba344-8b50-4014-baf3-24f8252f212e/console.log' append='off'/>
<target type='spapr-vio-serial' port='0'>
<model name='spapr-vty'/>
</target>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30000000'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<log file='/var/lib/nova/instances/a49ba344-8b50-4014-baf3-24f8252f212e/console.log' append='off'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
<address type='spapr-vio' reg='0x30000000'/>
</console>
<input type='tablet' bus='usb'>
<alias name='input0'/>
<address type='usb' bus='0' port='1'/>
</input>
<input type='keyboard' bus='usb'>
<alias name='input1'/>
<address type='usb' bus='0' port='2'/>
</input>
<input type='mouse' bus='usb'>
<alias name='input2'/>
<address type='usb' bus='0' port='3'/>
</input>
<graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
<listen type='address' address='0.0.0.0'/>
</graphics>
<video>
<model type='vga' vram='16384' heads='1' primary='yes'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</video>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0004' bus='0x04' slot='0x00' function='0x0'/>
</source>
<alias name='hostdev0'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0004' bus='0x05' slot='0x00' function='0x0'/>
</source>
<alias name='hostdev1'/>
<address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
</hostdev>
<memballoon model='virtio'>
<stats period='10'/>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</memballoon>
<panic model='pseries'/>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>libvirt-a49ba344-8b50-4014-baf3-24f8252f212e</label>
<imagelabel>libvirt-a49ba344-8b50-4014-baf3-24f8252f212e</imagelabel>
</seclabel>
<seclabel type='dynamic' model='dac' relabel='yes'>
<label>+64055:+116</label>
<imagelabel>+64055:+116</imagelabel>
</seclabel>
</domain>
Environment
===========
1. Canonical OpenStack Rocky on ppc64le
dpkg -l | grep nova
ii nova-api-os-compute 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - OpenStack Compute API frontend
ii nova-common 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - common files
ii nova-conductor 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - conductor service
ii nova-consoleauth 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - Console Authenticator
ii nova-novncproxy 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - NoVNC proxy
ii nova-placement-api 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - placement API frontend
ii nova-scheduler 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - virtual machine scheduler
ii python-novaclient 2:11.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - Python 2.7
ii python3-nova 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute Python 3 libraries
2. Which hypervisor did you use?
QEMU-KVM ppc64le version 2.11
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.3 LTS
Release: 18.04
Codename: bionic
# dpkg -l | egrep "libvirt|kvm|nova"
ii libvirt-clients 4.0.0-1ubuntu8.13 ppc64el Programs for the libvirt library
ii libvirt-daemon 4.0.0-1ubuntu8.13 ppc64el Virtualization daemon
ii libvirt-daemon-driver-storage-rbd 4.0.0-1ubuntu8.13 ppc64el Virtualization daemon RBD storage driver
ii libvirt-daemon-system 4.0.0-1ubuntu8.13 ppc64el Libvirt daemon configuration files
ii libvirt0:ppc64el 4.0.0-1ubuntu8.13 ppc64el library for interfacing with different virtualization systems
ii nova-api-metadata 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - metadata API frontend
ii nova-common 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - common files
ii nova-compute 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - compute node base
ii nova-compute-kvm 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute - compute node libvirt support
ii python3-libvirt 4.0.0-1 ppc64el libvirt Python 3 bindings
ii python3-nova 2:18.2.1-0ubuntu1~cloud4 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:11.0.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
ii qemu-kvm 1:2.11+dfsg-1ubuntu7.17 ppc64el QEMU Full virtualization on x86 hardware
Logs & Configs
==============
Logs and configs attached
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1895316/+subscriptions
References