← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1895316] Re: VM with single GPU flavor gets two GPUs assigned

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1895316

Title:
  VM with single GPU flavor gets two GPUs assigned

Status in OpenStack Compute (nova):
  Expired

Bug description:
  Description
  ===========
  Some VMs with the requested single PCI device (GPU) got provisioned with actual two GPUs attached.

  Steps to reproduce
  ==================
  Deploy a GPU-assigned VM with the following Heat template:
  $ openstack stack template show vmaas-p1220-kvm1
  description: Template to create a single VM for a self service project in CCCC OpenStack
  heat_template_version: rocky
  outputs:
    instance_ip:
      description: The IP address of the deployed instance
      value:
        get_attr:
        - server_1
        - first_address
    instance_name:
      description: Name of the instance
      value:
        get_attr:
        - server_1
        - name
  parameters:
    flavor:
      default: vmaas.p9.2xxlarge.v100-32.1
      description: Type of instance (flavor) to be used
      label: Flavor
      type: string
    image:
      default: rhel7.6alt-ppc64le
      description: Image to be used for compute instance
      label: Image name or ID
      type: string
    instance_boot_disk_name:
      default: p1220-kvm1-boot
      description: Name of instance boot volume
      label: Instance disk name
      type: string
    instance_boot_disk_size:
      default: '200'
      description: Size of instance boot volume
      label: Instance disk size
      type: string
    instance_ip:
      default: AAA.BB.CC.DD
      description: IP address of compute instance
      label: Instance IP address
      type: string
    instance_name:
      default: p1220-kvm1
      description: Name of compute instance
      label: Instance name
      type: string
    key:
      default: ''
      description: Name of existing ssh key-pair to be used for compute instance
      label: Key name
      type: string
    project_vlan:
      default: '1220'
      description: Project VLAN to attach instance to
      label: Network name or ID
      type: string
  resources:
    cloud_config_part1:
      properties:
        cloud_config:
          write_files:
          - content: ==cloud_config_data_here===
            encoding: b64
            owner: root:root
            path: /cloud-config.sh
            permissions: '0700'
      type: OS::Heat::CloudConfig
    cloud_config_part2:
      ==cloud_config_data_here===
      type: OS::Heat::CloudConfig
    cloud_config_run:
      properties:
        parts:
        - config:
            get_resource: cloud_config_part1
        - config:
            get_resource: cloud_config_part2
      type: OS::Heat::MultipartMime
    server_1:
      depends_on:
      - cloud_config_run
      - volume_1
      properties:
        block_device_mapping_v2:
        - boot_index: 0
          delete_on_termination: true
          volume_id:
            get_resource: volume_1
        config_drive: true
        flavor:
          get_param: flavor
        key_name:
          get_param: key
        metadata:
          Flavor:
            get_param: flavor
          Image:
            get_param: image
          Project: XXXXXXXX
          ProjectDescription: ''
          Reservation: YYYYYYYY
          Submitter: Portal/ZZZZZZZ
        name:
          get_param: instance_name
        networks:
        - fixed_ip:
            get_param: instance_ip
          network:
            list_join:
            - ''
            - - v
              - get_param: project_vlan
        user_data:
          get_resource: cloud_config_run
        user_data_format: RAW
      type: OS::Nova::Server
    volume_1:
      properties:
        image:
          get_param: image
        metadata:
          Project: XXXXXXXX
          Reservation: YYYYYYYY
        name:
          get_param: instance_boot_disk_name
        size:
          get_param: instance_boot_disk_size
      type: OS::Cinder::Volume

  And flavor:
  $ openstack flavor show 02542b5c-3bab-43df-9dcd-59f2867f344b
  +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | Field                      | Value                                                                                                                                                                                                                          |
  +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
  | OS-FLV-DISABLED:disabled   | False                                                                                                                                                                                                                          |
  | OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                                                                                              |
  | access_project_ids         | 4fcaf92a3fa148b4bc15d1a170bb67a0                                                                                                                                                                                               |
  | disk                       | 0                                                                                                                                                                                                                              |
  | id                         | 02542b5c-3bab-43df-9dcd-59f2867f344b                                                                                                                                                                                           |
  | name                       | vmaas.p9.2xxlarge.v100-32.1                                                                                                                                                                                                    |
  | os-flavor-access:is_public | False                                                                                                                                                                                                                          |
  | properties                 | aggregate_instance_extra_specs:cpu='p9', aggregate_instance_extra_specs:env='vmaas', aggregate_instance_extra_specs:gpu='v100-32', hw:cpu_cores='8', hw:cpu_sockets='1', hw:cpu_threads='4', pci_passthrough:alias='v100-32:1' |
  | ram                        | 65536                                                                                                                                                                                                                          |
  | rxtx_factor                | 1.0                                                                                                                                                                                                                            |
  | swap                       |                                                                                                                                                                                                                                |
  | vcpus                      | 32                                                                                                                                                                                                                             |
  +----------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

  Expected result
  ===============
  Get a VM assigned with the single PCI device (GPU)

  Actual result
  =============
  Got a VM with two GPUs attached.
  p1220-kvm1 ~]$ lspci
  0000:00:01.0 Ethernet controller: Red Hat, Inc. Virtio network device
  0000:00:02.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
  0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
  0000:00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
  0000:00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
  0000:00:06.0 VGA compatible controller: Device 1234:1111 (rev 02)
  0001:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)
  0002:00:01.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 SXM2 32GB] (rev a1)

  $ nvidia-smi
  Fri Sep 11 11:28:21 2020
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
  |-------------------------------+----------------------+----------------------+
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |===============================+======================+======================|
  |   0  Tesla V100-SXM2...  On   | 00000001:00:01.0 Off |                    0 |
  | N/A   26C    P0    38W / 300W |      0MiB / 32510MiB |      0%      Default |
  +-------------------------------+----------------------+----------------------+
  |   1  Tesla V100-SXM2...  On   | 00000002:00:01.0 Off |                    0 |
  | N/A   28C    P0    39W / 300W |      0MiB / 32510MiB |      0%      Default |
  +-------------------------------+----------------------+----------------------+

  +-----------------------------------------------------------------------------+
  | Processes:                                                       GPU Memory |
  |  GPU       PID   Type   Process name                             Usage      |
  |=============================================================================|
  |  No running processes found                                                 |
  +-----------------------------------------------------------------------------+

  # virsh dumpxml instance-00003268
  <domain type='kvm' id='24'>
    <name>instance-00003268</name>
    <uuid>a49ba344-8b50-4014-baf3-24f8252f212e</uuid>
    <metadata>
      <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0";>
        <nova:package version="18.2.1"/>
        <nova:name>p1220-kvm1</nova:name>
        <nova:creationTime>2020-09-11 08:28:22</nova:creationTime>
        <nova:flavor name="vmaas.p9.2xxlarge.v100-32.1">
          <nova:memory>65536</nova:memory>
          <nova:disk>0</nova:disk>
          <nova:swap>0</nova:swap>
          <nova:ephemeral>0</nova:ephemeral>
          <nova:vcpus>32</nova:vcpus>
        </nova:flavor>
        <nova:owner>
          <nova:user uuid="cc738b41714c47d5917a0a4a152ad133">vmaas</nova:user>
          <nova:project uuid="4fcaf92a3fa148b4bc15d1a170bb67a0">vmaas</nova:project>
        </nova:owner>
      </nova:instance>
    </metadata>
    <memory unit='KiB'>67108864</memory>
    <currentMemory unit='KiB'>67108864</currentMemory>
    <vcpu placement='static'>32</vcpu>
    <cputune>
      <shares>32768</shares>
    </cputune>
    <resource>
      <partition>/machine</partition>
    </resource>
    <os>
      <type arch='ppc64le' machine='pseries-bionic'>hvm</type>
      <boot dev='hd'/>
    </os>
    <features>
      <acpi/>
      <apic/>
    </features>
    <cpu mode='host-passthrough' check='none'>
      <topology sockets='1' cores='8' threads='4'/>
    </cpu>
    <clock offset='utc'>
      <timer name='pit' tickpolicy='delay'/>
      <timer name='rtc' tickpolicy='catchup'/>
    </clock>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
      <emulator>/usr/bin/kvm</emulator>
      <disk type='network' device='cdrom'>
        <driver name='qemu' type='raw' cache='none' discard='unmap'/>
        <auth username='nova-compute'>
          <secret type='ceph' uuid='514c9fca-8cbe-11e2-9c52-3bc8c7819472'/>
        </auth>
        <source protocol='rbd' name='nova/a49ba344-8b50-4014-baf3-24f8252f212e_disk.config'>
          <host name='10.0.0.11' port='6789'/>
          <host name='10.0.0.12' port='6789'/>
          <host name='10.0.0.13' port='6789'/>
        </source>
        <target dev='sda' bus='scsi'/>
        <readonly/>
        <alias name='scsi0-0-0-1'/>
        <address type='drive' controller='0' bus='0' target='0' unit='1'/>
      </disk>
      <disk type='network' device='disk'>
        <driver name='qemu' type='raw' cache='none' discard='unmap'/>
        <auth username='cinder-ceph'>
          <secret type='ceph' uuid='046a66b2-bf3d-4be9-a8c1-1334c3fbc3d7'/>
        </auth>
        <source protocol='rbd' name='cinder-ceph/volume-b7584535-d482-4bd8-bd0f-c08435708f23'>
          <host name='10.0.0.11' port='6789'/>
          <host name='10.0.0.12' port='6789'/>
          <host name='10.0.0.13' port='6789'/>
        </source>
        <target dev='vda' bus='virtio'/>
        <serial>b7584535-d482-4bd8-bd0f-c08435708f23</serial>
        <alias name='virtio-disk0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
      </disk>
      <controller type='scsi' index='0' model='virtio-scsi'>
        <alias name='scsi0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
      </controller>
      <controller type='usb' index='0' model='qemu-xhci'>
        <alias name='usb'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
      </controller>
      <controller type='pci' index='0' model='pci-root'>
        <model name='spapr-pci-host-bridge'/>
        <target index='0'/>
        <alias name='pci.0'/>
      </controller>
      <controller type='pci' index='1' model='pci-root'>
        <model name='spapr-pci-host-bridge'/>
        <target index='1'/>
        <alias name='pci.1'/>
      </controller>
      <controller type='pci' index='2' model='pci-root'>
        <model name='spapr-pci-host-bridge'/>
        <target index='2'/>
        <alias name='pci.2'/>
      </controller>
      <interface type='bridge'>
        <mac address='fa:16:3e:36:5a:d5'/>
        <source bridge='br-int'/>
        <virtualport type='openvswitch'>
          <parameters interfaceid='bb1b7bfb-bbe9-4560-8dd0-ae1688cdb16c'/>
        </virtualport>
        <target dev='tapbb1b7bfb-bb'/>
        <model type='virtio'/>
        <mtu size='9000'/>
        <alias name='net0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
      </interface>
      <serial type='pty'>
        <source path='/dev/pts/0'/>
        <log file='/var/lib/nova/instances/a49ba344-8b50-4014-baf3-24f8252f212e/console.log' append='off'/>
        <target type='spapr-vio-serial' port='0'>
          <model name='spapr-vty'/>
        </target>
        <alias name='serial0'/>
        <address type='spapr-vio' reg='0x30000000'/>
      </serial>
      <console type='pty' tty='/dev/pts/0'>
        <source path='/dev/pts/0'/>
        <log file='/var/lib/nova/instances/a49ba344-8b50-4014-baf3-24f8252f212e/console.log' append='off'/>
        <target type='serial' port='0'/>
        <alias name='serial0'/>
        <address type='spapr-vio' reg='0x30000000'/>
      </console>
      <input type='tablet' bus='usb'>
        <alias name='input0'/>
        <address type='usb' bus='0' port='1'/>
      </input>
      <input type='keyboard' bus='usb'>
        <alias name='input1'/>
        <address type='usb' bus='0' port='2'/>
      </input>
      <input type='mouse' bus='usb'>
        <alias name='input2'/>
        <address type='usb' bus='0' port='3'/>
      </input>
      <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0' keymap='en-us'>
        <listen type='address' address='0.0.0.0'/>
      </graphics>
      <video>
        <model type='vga' vram='16384' heads='1' primary='yes'/>
        <alias name='video0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
      </video>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0004' bus='0x04' slot='0x00' function='0x0'/>
        </source>
        <alias name='hostdev0'/>
        <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
        <driver name='vfio'/>
        <source>
          <address domain='0x0004' bus='0x05' slot='0x00' function='0x0'/>
        </source>
        <alias name='hostdev1'/>
        <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
      </hostdev>
      <memballoon model='virtio'>
        <stats period='10'/>
        <alias name='balloon0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
      </memballoon>
      <panic model='pseries'/>
    </devices>
    <seclabel type='dynamic' model='apparmor' relabel='yes'>
      <label>libvirt-a49ba344-8b50-4014-baf3-24f8252f212e</label>
      <imagelabel>libvirt-a49ba344-8b50-4014-baf3-24f8252f212e</imagelabel>
    </seclabel>
    <seclabel type='dynamic' model='dac' relabel='yes'>
      <label>+64055:+116</label>
      <imagelabel>+64055:+116</imagelabel>
    </seclabel>
  </domain>

  Environment
  ===========
  1. Canonical OpenStack Rocky on ppc64le
  dpkg -l | grep nova
  ii  nova-api-os-compute               2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - OpenStack Compute API frontend
  ii  nova-common                       2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - common files
  ii  nova-conductor                    2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - conductor service
  ii  nova-consoleauth                  2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - Console Authenticator
  ii  nova-novncproxy                   2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - NoVNC proxy
  ii  nova-placement-api                2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - placement API frontend
  ii  nova-scheduler                    2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - virtual machine scheduler
  ii  python-novaclient                 2:11.0.0-0ubuntu1~cloud0               all          client library for OpenStack Compute API - Python 2.7
  ii  python3-nova                      2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute Python 3 libraries

  2. Which hypervisor did you use?
     QEMU-KVM ppc64le version 2.11
  # lsb_release -a
  No LSB modules are available.
  Distributor ID:	Ubuntu
  Description:	Ubuntu 18.04.3 LTS
  Release:	18.04
  Codename:	bionic

  # dpkg -l | egrep "libvirt|kvm|nova"
  ii  libvirt-clients                       4.0.0-1ubuntu8.13                      ppc64el      Programs for the libvirt library
  ii  libvirt-daemon                        4.0.0-1ubuntu8.13                      ppc64el      Virtualization daemon
  ii  libvirt-daemon-driver-storage-rbd     4.0.0-1ubuntu8.13                      ppc64el      Virtualization daemon RBD storage driver
  ii  libvirt-daemon-system                 4.0.0-1ubuntu8.13                      ppc64el      Libvirt daemon configuration files
  ii  libvirt0:ppc64el                      4.0.0-1ubuntu8.13                      ppc64el      library for interfacing with different virtualization systems
  ii  nova-api-metadata                     2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - metadata API frontend
  ii  nova-common                           2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - common files
  ii  nova-compute                          2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - compute node base
  ii  nova-compute-kvm                      2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - compute node (KVM)
  ii  nova-compute-libvirt                  2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute - compute node libvirt support
  ii  python3-libvirt                       4.0.0-1                                ppc64el      libvirt Python 3 bindings
  ii  python3-nova                          2:18.2.1-0ubuntu1~cloud4               all          OpenStack Compute Python 3 libraries
  ii  python3-novaclient                    2:11.0.0-0ubuntu1~cloud0               all          client library for OpenStack Compute API - 3.x
  ii  qemu-kvm                              1:2.11+dfsg-1ubuntu7.17                ppc64el      QEMU Full virtualization on x86 hardware

  Logs & Configs
  ==============
  Logs and configs attached

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1895316/+subscriptions


References