← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1807352] Re: Virtual devices in Windows VMs occasionally loose connection to host

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1807352

Title:
  Virtual devices in Windows VMs occasionally loose connection to host

Status in OpenStack Compute (nova):
  Expired

Bug description:
  Description
  ===========
  Multiple Windows virtual machine running on our OpenStack cloud seems to occasionally loose the connection from their disk & network virtual device (virtio) to the underlying host. The Windows VM is then no longer able to write down to the disk (see below for logs). Additionally the network connection is shown as "No internet access" (yellow alert symbol). Rebooting the affected VM fixes the problems.

  Steps to reproduce
  ==================
  We don't know how to reproduce the situation.

  Expected result
  ===============
  Windows VMs should run without any virtual device connectivity issues.

  Actual result
  =============
  Virtual devices in Windows VMs occasionally loose connection to host, as described above.

  Environment
  ===========
  Host:
  * CentOS Linux release 7.5.1804 (Core) / Kernel: 3.10.0-862
  * Nova: openstack-nova-compute-15.1.3-1
  * Hypervisor: Libvirt + KVM (Libvirt 3.9.0 / QEMU 2.10.0)
  * Storage: Ceph 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
  * Networking:  Neutron with OpenVSwitch

  VM:
  * Windows Server 2016 Standard
  * Red Hat VirtIO SCSI Disk Device (Driver version: 10.0.14393.1613)
  * Red Hat VirtIO Ethernet Adapter (Driver version: 100.76.104.14900)
  * Network: Static IP configuration

  Libvirt XML (critical but irelevant data replaced by '***'):
  <domain type='kvm' id='466'>
    <name>***</name>
    <uuid>***</uuid>
    <metadata>
      <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0";>
        <nova:package version="15.1.3-1.el7"/>
        <nova:name>***</nova:name>
        <nova:creationTime>2018-12-05 07:11:13</nova:creationTime>
        <nova:flavor name="c04m036">
          <nova:memory>36864</nova:memory>
          <nova:disk>0</nova:disk>
          <nova:swap>0</nova:swap>
          <nova:ephemeral>0</nova:ephemeral>
          <nova:vcpus>4</nova:vcpus>
        </nova:flavor>
        <nova:owner>
          <nova:user uuid="***">***</nova:user>
          <nova:project uuid="***">***</nova:project>
        </nova:owner>
        <nova:root type="image" uuid="***"/>
      </nova:instance>
    </metadata>
    <memory unit='KiB'>37748736</memory>
    <currentMemory unit='KiB'>37748736</currentMemory>
    <vcpu placement='static'>4</vcpu>
    <cputune>
      <shares>4096</shares>
    </cputune>
    <resource>
      <partition>/machine</partition>
    </resource>
    <sysinfo type='smbios'>
      <system>
        <entry name='manufacturer'>RDO</entry>
        <entry name='product'>OpenStack Compute</entry>
        <entry name='version'>15.1.3-1.el7</entry>
        <entry name='serial'>9646161a-aa15-4a0a-b20f-f2a984052149</entry>
        <entry name='uuid'>***</entry>
        <entry name='family'>Virtual Machine</entry>
      </system>
    </sysinfo>
    <os>
      <type arch='x86_64' machine='pc-q35-rhel7.5.0'>hvm</type>
      <boot dev='hd'/>
      <smbios mode='sysinfo'/>
    </os>
    <features>
      <acpi/>
      <apic/>
      <hyperv>
        <relaxed state='on'/>
        <vapic state='on'/>
        <spinlocks state='on' retries='8191'/>
      </hyperv>
    </features>
    <cpu mode='custom' match='exact' check='full'>
      <model fallback='forbid'>SandyBridge-IBRS</model>
      <topology sockets='4' cores='1' threads='1'/>
      <feature policy='require' name='vme'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='arat'/>
      <feature policy='require' name='xsaveopt'/>
    </cpu>
    <clock offset='localtime'>
      <timer name='pit' tickpolicy='delay'/>
      <timer name='rtc' tickpolicy='catchup'/>
      <timer name='hpet' present='no'/>
      <timer name='hypervclock' present='yes'/>
    </clock>
    <on_poweroff>destroy</on_poweroff>
    <on_reboot>restart</on_reboot>
    <on_crash>destroy</on_crash>
    <devices>
      <emulator>/usr/libexec/qemu-kvm</emulator>
      <disk type='network' device='disk'>
        <driver name='qemu' type='raw' cache='writeback'/>
        <auth username='***'>
          <secret type='ceph' uuid='***'/>
        </auth>
        <source protocol='rbd' name='***'>
          <host name='***' port='***'/>
          <host name='***' port='***'/>
          <host name='***' port='***'/>
        </source>
        <target dev='vda' bus='virtio'/>
        <serial>***</serial>
        <alias name='virtio-disk0'/>
        <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </disk>
      <disk type='network' device='disk'>
        <driver name='qemu' type='raw' cache='writeback'/>
        <auth username='***'>
          <secret type='ceph' uuid='***'/>
        </auth>
        <source protocol='rbd' name='***'>
          <host name='***' port='***'/>
          <host name='***' port='***'/>
          <host name='***' port='***'/>
        </source>
        <target dev='vdb' bus='virtio'/>
        <serial>***</serial>
        <alias name='virtio-disk1'/>
        <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
      </disk>
      <controller type='usb' index='0' model='nec-xhci'>
        <alias name='usb'/>
        <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
      </controller>
      <controller type='sata' index='0'>
        <alias name='ide'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
      </controller>
      <controller type='pci' index='0' model='pcie-root'>
        <alias name='pcie.0'/>
      </controller>
      <controller type='virtio-serial' index='0'>
        <alias name='virtio-serial0'/>
        <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
      </controller>
      <controller type='pci' index='1' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='1' port='0x10'/>
        <alias name='pci.1'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
      </controller>
      <controller type='pci' index='2' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='2' port='0x11'/>
        <alias name='pci.2'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
      </controller>
      <controller type='pci' index='3' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='3' port='0x12'/>
        <alias name='pci.3'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
      </controller>
      <controller type='pci' index='4' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='4' port='0x13'/>
        <alias name='pci.4'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
      </controller>
      <controller type='pci' index='5' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='5' port='0x14'/>
        <alias name='pci.5'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
      </controller>
      <controller type='pci' index='6' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='6' port='0x15'/>
        <alias name='pci.6'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
      </controller>
      <interface type='bridge'>
        <mac address='***'/>
        <source bridge='qbr70392c0e-f1'/>
        <target dev='tap70392c0e-f1'/>
        <model type='virtio'/>
        <alias name='net0'/>
        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
      </interface>
      <serial type='pty'>
        <source path='/dev/pts/12'/>
        <log file='/var/lib/nova/instances/***/console.log' append='off'/>
        <target type='isa-serial' port='0'>
          <model name='isa-serial'/>
        </target>
        <alias name='serial0'/>
      </serial>
      <console type='pty' tty='/dev/pts/12'>
        <source path='/dev/pts/12'/>
        <log file='/var/lib/nova/instances/***/console.log' append='off'/>
        <target type='serial' port='0'/>
        <alias name='serial0'/>
      </console>
      <channel type='spicevmc'>
        <target type='virtio' name='com.redhat.spice.0' state='disconnected'/>
        <alias name='channel0'/>
        <address type='virtio-serial' controller='0' bus='0' port='1'/>
      </channel>
      <channel type='unix'>
        <source mode='bind' path='/var/lib/libvirt/qemu/org.qemu.guest_agent.0.***.sock'/>
        <target type='virtio' name='org.qemu.guest_agent.0' state='connected'/>
        <alias name='channel1'/>
        <address type='virtio-serial' controller='0' bus='0' port='2'/>
      </channel>
      <input type='mouse' bus='ps2'>
        <alias name='input0'/>
      </input>
      <input type='keyboard' bus='ps2'>
        <alias name='input1'/>
      </input>
      <graphics type='spice' port='***' autoport='yes' listen='0.0.0.0' keymap='en-us'>
        <listen type='address' address='***'/>
      </graphics>
      <video>
        <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
        <alias name='video0'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
      </video>
      <memballoon model='virtio'>
        <stats period='10'/>
        <alias name='balloon0'/>
        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
      </memballoon>
    </devices>
    <seclabel type='dynamic' model='selinux' relabel='yes'>
      <label>system_u:system_r:svirt_t:s0:c9,c745</label>
      <imagelabel>system_u:object_r:svirt_image_t:s0:c9,c745</imagelabel>
    </seclabel>
    <seclabel type='dynamic' model='dac' relabel='yes'>
      <label>+107:+107</label>
      <imagelabel>+107:+107</imagelabel>
    </seclabel>
  </domain>

  Logs & Configs
  ==============
  Compute and storage servers:
  No relevant log entries found.

  Windows VM:
  Level	Date and Time	Source	Event ID	Task Category
  Warning	05.12.2018 07:54:44	vhdmp	129	None	"\Device\RaidPort2
  "
  Warning	05.12.2018 07:54:37	Disk	153	None	The IO operation at logical block address 0xa968 for Disk 9 (PDO name: \Device\00000f0d) was retried.
  Warning	05.12.2018 07:54:33	NTFS	50	None	{Delayed Write Failed} Windows was unable to save all the data for the file . The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
  Information	05.12.2018 07:54:33	Application Popup	26	None	Application popup: Windows - Delayed Write Failed : Exception Processing Message 0xc0000222 Parameters 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28
  Information	05.12.2018 07:54:33	Application Popup	26	None	Application popup: Windows - Delayed Write Failed : Exception Processing Message 0xc0000222 Parameters 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28
  Warning	05.12.2018 07:54:33	NTFS	50	None	{Delayed Write Failed} Windows was unable to save all the data for the file \$Extend\$UsnJrnl:$J:$DATA. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
  Warning	05.12.2018 07:54:33	NTFS	50	None	{Delayed Write Failed} Windows was unable to save all the data for the file \AppData\Local\Microsoft\Windows\WebCache\V01.loglog. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
  Warning	05.12.2018 07:54:33	NTFS	50	None	{Delayed Write Failed} Windows was unable to save all the data for the file \AppData\Local\Microsoft\Windows\WebCache\WebCacheV01.jfm. The data has been lost. This error may be caused by a failure of your computer hardware or network connection. Please try to save this file elsewhere.
  Information	05.12.2018 07:54:33	Application Popup	26	None	Application popup: Windows - Delayed Write Failed : Exception Processing Message 0xc0000222 Parameters 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28 0x7ffdf3051d28
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Error	05.12.2018 07:54:32	NTFS	137	(2)	The default transaction resource manager on volume \\?\Volume{de53bca7-0000-0000-0000-100000000000} encountered a non-retryable error and could not start.  The data contains the error code.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Microsoft-Windows-Ntfs	140	None	"The system failed to flush data to the transaction log. Corruption may occur in VolumeId: \\?\Volume{de53bca7-0000-0000-0000-100000000000}, DeviceName: \Device\HarddiskVolume100.
  (A device which does not exist was specified.)"
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	153	None	The IO operation at logical block address 0x2dc40 for Disk 6 (PDO name: \Device\00000ea0) was retried.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	51	None	An error was detected on device \Device\Harddisk6\DR98 during a paging operation.
  Warning	05.12.2018 07:54:32	Disk	153	None	The IO operation at logical block address 0xace0 for Disk 6 (PDO name: \Device\00000ea0) was retried.
  Warning	05.12.2018 07:54:32	Disk	153	None	The IO operation at logical block address 0x32ac0 for Disk 6 (PDO name: \Device\00000ea0) was retried.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1807352/+subscriptions


References