yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #87221
[Bug 1943863] Re: DPDK instances are failing to start: Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file or directory
https://github.com/openstack-charmers/charm-layer-ovn/pull/52
** Also affects: neutron
Importance: Undecided
Status: New
** No longer affects: neutron
** No longer affects: neutron (Ubuntu)
** Also affects: charm-layer-ovn
Importance: Undecided
Status: New
** Changed in: charm-layer-ovn
Status: New => Confirmed
** Changed in: charm-layer-ovn
Importance: Undecided => High
** Changed in: charm-layer-ovn
Assignee: (unassigned) => Liam Young (gnuoy)
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1943863
Title:
DPDK instances are failing to start: Failed to bind socket to
/run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file or directory
Status in charm-layer-ovn:
Confirmed
Status in OpenStack nova-compute charm:
Invalid
Bug description:
== Env
focal/ussuri + ovn, latest stable charms
juju status: https://paste.ubuntu.com/p/2725tV47ym/
Hardware: Huawei CH121 V5 with MZ532,4*25GE Mezzanine Card,PCIE 3.0 X16 NICs + manually installed PMD for DPDK enablement (librte-pmd-hinic20.0 package)
== Problem description
DPDK instance can't be launched after the fresh deployment
(focal/ussuri + OVN, latest stable charms), raising a below error:
$ os server show dpdk-test-instance -f yaml
OS-DCF:diskConfig: MANUAL
OS-EXT-AZ:availability_zone: ''
OS-EXT-SRV-ATTR:host: null
OS-EXT-SRV-ATTR:hypervisor_hostname: null
OS-EXT-SRV-ATTR:instance_name: instance-00000218
OS-EXT-STS:power_state: NOSTATE
OS-EXT-STS:task_state: null
OS-EXT-STS:vm_state: error
OS-SRV-USG:launched_at: null
OS-SRV-USG:terminated_at: null
accessIPv4: ''
accessIPv6: ''
addresses: ''
config_drive: 'True'
created: '2021-09-15T18:51:00Z'
fault:
code: 500
created: '2021-09-15T18:52:01Z'
details: "Traceback (most recent call last):\n File \"/usr/lib/python3/dist-packages/nova/conductor/manager.py\"\
, line 651, in build_instances\n scheduler_utils.populate_retry(\n File \"\
/usr/lib/python3/dist-packages/nova/scheduler/utils.py\", line 919, in populate_retry\n\
\ raise exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded:\
\ Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance\
\ 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process\
\ exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:\
\ -chardev socket,id=charnet0,path=/run/libvirt-vhost-user/vhu3ba44fdc-7c,server:\
\ Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file\
\ or directory\n"
message: 'Exceeded maximum number of retries. Exceeded max scheduling attempts 3
for instance 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error:
process exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:
-chardev '
flavor: m1.medium.project.dpdk (4f452aa3-2b2c-4f2e-8465-5e3c2d8ec3f1)
hostId: ''
id: 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73
image: auto-sync/ubuntu-bionic-18.04-amd64-server-20210907-disk1.img (3851450e-e73d-489b-a356-33650690ed7a)
key_name: ubuntu-keypair
name: dpdk-test-instance
project_id: cdade870811447a89e2f0199373a0d95
properties: ''
status: ERROR
updated: '2021-09-15T18:52:01Z'
user_id: 13a0e7862c6641eeaaebbde1ae096f9e
volumes_attached: ''
For the record, a "generic" instances (e.g non-DPDK/non-SRIOV) are
scheduling/starting without any issues.
== Steps to reproduce
openstack network create --external --provider-network-type vlan --provider-segment xxx --provider-physical-network dpdkfabric ext_net_dpdk
openstack subnet create --allocation-pool start=<redacted>,end=<redacted> --network ext_net_dpdk --subnet-range <redacted>/23 --gateway <redacted> --no-dhcp ext_net_dpdk_subnet
openstack aggregate create --zone nova dpdk
openstack aggregate set --property dpdk=true dpdk
openstack aggregate add host dpdk <fqdn>
openstack aggregate show dpdk --max-width=80
openstack flavor set --property
aggregate_instance_extra_specs:dpdk=true --property
hw:mem_page_size=large m1.medium.dpdk
openstack server create --config-drive true --network ext_net_dpdk
--key-name ubuntu-keypair --image focal --flavor m1.medium.dpdk dpdk-
test-instance
== Analysis
[before redeployment] nova-compute log : https://pastebin.canonical.com/p/FgPYNb3bPj/
[fresh deployment] juju crashdump: https://drive.google.com/file/d/1W_w3CAUq4ggp4alDnpCk08mSaCL6Uaxk/view?usp=sharing
<on hypervisor>
# ovs-vsctl get open_vswitch . other_config
{dpdk-extra="--pci-whitelist 0000:3e:00.0 --pci-whitelist 0000:40:00.0", dpdk-init="true", dpdk-lcore-mask="0x1000001", dpdk-socket-mem="4096,4096"}
# cat /etc/tmpfiles.d/nova-ovs-vhost-user.conf
# Create libvirt writeable directory for vhost-user sockets
d /run/libvirt-vhost-user 0770 libvirt-qemu kvm - -
In fact, none of the compute hosts have that file:
https://paste.ubuntu.com/p/XJRFypbMQf/ (however, the error from this
issue doesn't appear on non-DPDK hosts).
After doing the below command, that missing /run/... file has appeared
and VM could have been scheduled and started. However, although it
have been started, it wasn't reachable over the network.
# systemd-tmpfiles --create
# stat /run/libvirt-vhost-user
File: /run/libvirt-vhost-user
Size: 40 Blocks: 0 IO Block: 4096 directory
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-layer-ovn/+bug/1943863/+subscriptions