← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1777608] Re: Nova compute calls plug_vifs unnecessarily for ironic nodes in init_host

 

Reviewed:  https://review.opendev.org/c/openstack/nova/+/813263
Committed: https://opendev.org/openstack/nova/commit/7f81cf28bf21ad2afa98accfde3087c83b8e269b
Submitter: "Zuul (22348)"
Branch:    master

commit 7f81cf28bf21ad2afa98accfde3087c83b8e269b
Author: Julia Kreger <juliaashleykreger@xxxxxxxxx>
Date:   Fri Oct 8 14:35:00 2021 -0700

    Ignore plug_vifs on the ironic driver
    
    When the nova-compute service starts, by default it attempts to
    startup instance configuration states for aspects such as networking.
    This is fine in most cases, and makes a lot of sense if the
    nova-compute service is just managing virtual machines on a hypervisor.
    
    This is done, one instance at a time.
    
    However, when the compute driver is ironic, the networking is managed
    as part of the physical machine lifecycle potentially all the way into
    committed switch configurations. As such, there is no need to attempt
    to call ``plug_vifs`` on every single instance managed by the
    nova-compute process which is backed by Ironic.
    
    Additionally, using ironic tends to manage far more physical machines
    per nova-compute service instance then when when operating co-installed
    with a hypervisor. Often this means a cluster of a thousand machines,
    with three controllers, will see thousands of un-needed API calls upon
    service start, which elongates the entire process and negatively
    impacts operations.
    
    In essence, nova.virt.ironic's plug_vifs call now does nothing,
    and merely issues a debug LOG entry when called.
    
    Closes-Bug: #1777608
    Change-Id: Iba87cef50238c5b02ab313f2311b826081d5b4ab


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1777608

Title:
  Nova compute calls plug_vifs unnecessarily for ironic nodes in
  init_host

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Originally reported there
  https://bugzilla.redhat.com/show_bug.cgi?id=1590297#c14 and tracked
  there https://bugzilla.redhat.com/show_bug.cgi?id=1592427

  @owalsh: Looks like a race in the service startup.

  (undercloud) [stack@undercloud ~]$ openstack server list
  +--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
  | ID | Name | Status | Networks | Image | Flavor |
  +--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+
  | fcda23f8-7b98-41bf-9e45-7a4579164874 | overcloud-controller-1 | ERROR | ctlplane=192.168.24.7 | overcloud-full_renamed | oooq_control |
  | ced0e0db-ad1f-4381-a523-0a79ae0303ff | overcloud-controller-2 | ERROR | ctlplane=192.168.24.21 | overcloud-full_renamed | oooq_control |
  | 0731685f-a8fc-4126-868a-bdf9879238f3 | overcloud-controller-0 | ERROR | ctlplane=192.168.24.12 | overcloud-full_renamed | oooq_control |
  | c4fd4e28-fe44-40d2-9b40-7196b7c3b6a2 | overcloud-novacompute-2 | ERROR | ctlplane=192.168.24.15 | overcloud-full_renamed | oooq_compute |
  | 5bef3afd-4485-4cbc-b76c-f126c85bd015 | overcloud-novacompute-1 | ERROR | ctlplane=192.168.24.8 | overcloud-full_renamed | oooq_compute |
  | e4c9da33-c452-446c-82c4-bd55a6b294d8 | overcloud-novacompute-0 | ERROR | ctlplane=192.168.24.9 | overcloud-full_renamed | oooq_compute |
  +--------------------------------------+-------------------------+--------+------------------------+------------------------+--------------+

  Looking at controller-1...

  (undercloud) [stack@undercloud ~]$ openstack server show overcloud-controller-1
  +-------------------------------------+---------------------------------------------------------------+
  | Field | Value |
  +-------------------------------------+---------------------------------------------------------------+
  | OS-DCF:diskConfig | MANUAL |
  | OS-EXT-AZ:availability_zone | nova |
  | OS-EXT-SRV-ATTR:host | undercloud |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | b6a32fba-b57a-4e7b-a6ce-99941b4d134d |
  | OS-EXT-SRV-ATTR:instance_name | instance-00000005 |
  | OS-EXT-STS:power_state | Running |
  | OS-EXT-STS:task_state | None |
  | OS-EXT-STS:vm_state | error |
  | OS-SRV-USG:launched_at | 2018-06-11T14:56:34.000000 |
  | OS-SRV-USG:terminated_at | None |
  | accessIPv4 | |
  | accessIPv6 | |
  | addresses | ctlplane=192.168.24.7 |
  | config_drive | True |
  | created | 2018-06-11T14:53:09Z |
  | flavor | oooq_control (04f8ba26-e9bd-472f-b92a-919e7ec8bed1) |
  | hostId | da8848b6f3dc77a51235f70dcf44df197261eca0529173f670c94ce9 |
  | id | fcda23f8-7b98-41bf-9e45-7a4579164874 |
  | image | overcloud-full_renamed (4d965d80-61fc-45d0-88e1-e6365c4afd57) |
  | key_name | default |
  | name | overcloud-controller-1 |
  | project_id | 3b778414471a47e4b0760bbecc9d4070 |
  | properties | |
  | security_groups | name='default' |
  | status | ERROR |
  | updated | 2018-06-15T09:29:17Z |
  | user_id | 356b9a5451c643bb8162b9349bc9487b |
  | volumes_attached | |
  +-------------------------------------+---------------------------------------------------------------+

  From /var/log/ironic/ironic-conductor.log I can see that ironic-
  conductor started loading extensions after the updated timestamp:

  2018-06-15 09:29:18.576 1408 DEBUG oslo_concurrency.lockutils
  [req-6ad8cb26-b607-4f65-a8b7-30ac72a7a997 - - - - -] Lock
  "extension_manager" acquired by
  "ironic.common.driver_factory._init_extension_manager" :: waited
  0.000s inner /usr/lib/python2.7/site-
  packages/oslo_concurrency/lockutils.py:273

  The errors in /var/log/ironic/app.log occurred before this, because
  there wasn't a conductor registered that supports ipmi:

  2018-06-15 09:29:14.419 1793 DEBUG wsme.api [req-81ce866e-1c15-4499-a452-044a44efa1c0 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
  2018-06-15 09:29:15.304 1792 DEBUG wsme.api [req-be8a2443-12ad-4115-9640-b91b39d15765 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
  2018-06-15 09:29:16.105 1793 DEBUG wsme.api [req-0f2c2010-3bce-4600-8c8a-54a6aa0b8965 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
  2018-06-15 09:29:16.797 1792 DEBUG wsme.api [req-e660c0cd-8a25-4b93-815f-edc26c15ddba 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
  2018-06-15 09:29:17.596 1793 DEBUG wsme.api [req-ffa50a06-3a62-4f3e-937c-4304818dca4c 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222
  2018-06-15 09:29:18.277 1793 DEBUG wsme.api [req-c391da42-997e-49ee-b604-11ab81f875e2 1a609f3c25c24c45ac30ee2fcc721eac 5909cc46dbad48058daa89d48f07ba71 - default default] Client-side error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. format_exception /usr/lib/python2.7/site-packages/wsme/api.py:222

  And /var/log/nova/nova-compute.log:

  2018-06-15 09:29:12.007 2623 INFO nova.service [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Starting compute node (version 18.0.0-0.20180601221704.f902e0d.el7)
  2018-06-15 09:29:12.135 2623 DEBUG nova.servicegroup.drivers.db [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Seems service nova-compute on host undercloud is down. Last heartbeat was 2018-06-15 09:27:40. Elapsed time is 92.135605 is_up /usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py:79
  2018-06-15 09:29:12.260 2623 DEBUG nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Checking state _get_power_state /usr/lib/python2.7/site-packages/nova/compute/manager.py:1167
  2018-06-15 09:29:13.819 2623 DEBUG nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] plug: instance_uuid=5bef3afd-4485-4cbc-b76c-f126c85bd015 vif=[{"profile": {}, "ovs_interfaceid": null, "preserve_on_delete": true, "network": {"bridge": null, "subnets": [{"ips": [{"meta": {}, "version": 4, "type": "fixed", "floating_ips": [], "address": "192.168.24.8"}], "version": 4, "meta": {"dhcp_server": "192.168.24.5"}, "dns": [{"meta": {}, "version": 4, "type": "dns", "address": "192.168.23.1"}], "routes": [{"interface": null, "cidr": "169.254.169.254/32", "meta": {}, "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "cidr": "192.168.24.0/24", "gateway": {"meta": {}, "version": 4, "type": "gateway", "address": "192.168.24.1"}}], "meta": {"injected": false, "tenant_id": "3b778414471a47e4b0760bbecc9d4070", "mtu": 1500}, "id": "1fe54003-4d8c-4b37-9f20-f04216ac4e26", "label": "ctlplane"}, "devname": "tapcbd2c5be-32", "vnic_type": "baremetal", "qbh_params": null, "meta": {}, "details": {}, "address": "00:1a:24:52:db:70", "active": true, "type": "other", "id": "cbd2c5be-32fd-4c5c-9a14-43e323071912", "qbg_params": null}] _plug_vifs /usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py:1397
  2018-06-15 09:29:14.422 2623 ERROR nova.virt.ironic.driver [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400): BadRequest: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [req-42a10a27-6527-45d3-9b7d-0c4dc2ac2f13 - - - - -] [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Vifs plug failed: VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] Traceback (most recent call last):
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 942, in _init_instance
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] self.driver.plug_vifs(instance, net_info)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1441, in plug_vifs
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] self._plug_vifs(node, instance, network_info)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] File "/usr/lib/python2.7/site-packages/nova/virt/ironic/driver.py", line 1408, in _plug_vifs
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] raise exception.VirtualInterfacePlugException(msg)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015] VirtualInterfacePlugException: Cannot attach VIF cbd2c5be-32fd-4c5c-9a14-43e323071912 to the node 059ac0f4-3a78-4f67-9f75-57a7a1f9ec5c due to error: No valid host was found. Reason: No conductor service registered which supports driver ipmi. (HTTP 400)
  2018-06-15 09:29:14.422 2623 ERROR nova.compute.manager [instance: 5bef3afd-4485-4cbc-b76c-f126c85bd015]

  etc... for the other instances

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1777608/+subscriptions



References