← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1651704] Re: Errors when starting introspection are silently ignored

 

Reviewed:  https://review.openstack.org/418423
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=c7b01eba55e5d133ccc19451cf4727170a5dbdd0
Submitter: Jenkins
Branch:    master

commit c7b01eba55e5d133ccc19451cf4727170a5dbdd0
Author: Dougal Matthews <dougal@xxxxxxxxxx>
Date:   Tue Jan 10 14:35:36 2017 +0000

    Fail the baremetal workflows when sending a "FAILED" message
    
    When Mistral workflows execute a second workflow (a sub-workflow
    execution), the parent workflow can't easily determine if sub-workflow
    failed.  This is because the failure is communicated via a Zaqar message
    only and when a workflow ends with a successful Zaqar message it appears
    have been successful. This problem surfaces because parent workflows
    should have an "on-error" attribute but it is never called, as the
    workflow doesn't error.
    
    This change marks the workflow as failed if the message has the status
    "FAILED". Now when a sub-workflow fails, the task that called it should
    have the on-error triggered. Previously it would always go to
    on-success.
    
    Closes-Bug: #1651704
    Change-Id: I60444ec692351c44753649b59b7c1d7c4b61fa8e


** Changed in: tripleo
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1651704

Title:
  Errors when starting introspection are silently ignored

Status in Ironic Inspector:
  Incomplete
Status in OpenStack Compute (nova):
  Invalid
Status in tripleo:
  Fix Released
Status in ironic-inspector package in Ubuntu:
  Invalid

Bug description:
  Running tripleo using tripleo-quickstart with minimal profile
  (step_introspect: true) for master branch, overcloud deploy with
  error:

      ResourceInError: resources.Controller: Went to status ERROR due to
  "Message: No valid host was found. There are not enough hosts
  available., Code: 500"

  Looking at nova-scheduler.log, following errors are found:

      https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-
  promote-master-delorean-minimal-806/undercloud/var/log/nova/nova-
  scheduler.log.gz

      2016-12-21 06:45:56.822 17759 DEBUG nova.scheduler.host_manager
  [req-f889dbc0-1096-4f92-80fc-3c5bdcb1ad29
  4f103e0230074c2488b7359bc079d323 f21dbfb3b2c840059ec2a0bba03b7385 - -
  -] Update host state from compute node:
  ComputeNode(cpu_allocation_ratio=16.0,cpu_info='',created_at=2016-12-21T06:38:28Z,current_workload=0,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=0,free_disk_gb=0,free_ram_mb=0,host='undercloud',host_ip=192.168.23.46,hypervisor_hostname
  ='c6f8f4ba-9c7c-4c87-b95a-
  67a5861b7bec',hypervisor_type='ironic',hypervisor_version=1,id=2,local_gb=0,local_gb_used=0,memory_mb=0,memory_mb_used=0,metrics='[]',numa_topology=None,pci_device_pools=PciDevicePoolList,ram_allocation_ratio=1.0,running_vms=0,service_id=None,stats={boot_option='local',cpu_aes='true',cpu_arch='x86_64',cpu_hugepages='true',cpu_hugepages_1g='true',cpu_vt='true',profile='control'},supported_hv_specs=[HVSpec],updated_at=2016-12-21T06:45:38Z,uuid
  =ac2742da-39fb-4ca4-9f78-8e04f703c7a6,vcpus=0,vcpus_used=0)
  _locked_update /usr/lib/python2.7/site-
  packages/nova/scheduler/host_manager.py:168

      2016-12-21 06:47:48.893 17759 DEBUG
  nova.scheduler.filters.ram_filter [req-2aece1c8-6d3e-457b-
  92d7-a3177680c82e 4f103e0230074c2488b7359bc079d323
  f21dbfb3b2c840059ec2a0bba03b7385 - - -] (undercloud, c6f8f4ba-9c7c-
  4c87-b95a-67a5861b7bec) ram: 0MB disk: 0MB io_ops: 0 instances: 0 does
  not have 8192 MB usable ram before overcommit, it only has 0 MB.
  host_passes /usr/lib/python2.7/site-
  packages/nova/scheduler/filters/ram_filter.py:45

      2016-12-21 06:47:48.894 17759 INFO nova.filters [req-2aece1c8
  -6d3e-457b-92d7-a3177680c82e 4f103e0230074c2488b7359bc079d323
  f21dbfb3b2c840059ec2a0bba03b7385 - - -] Filter RamFilter returned 0
  hosts

  My guess is that node introspection is failing to get proper node
  information.

  Full logs can be found in https://ci.centos.org/artifacts/rdo/jenkins-
  tripleo-quickstart-promote-master-delorean-minimal-806/undercloud/

  We have hit this issue twice in the last runs.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic-inspector/+bug/1651704/+subscriptions