← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1931702] Re: BUG: soft lockup - CPU#0 stuck for 22s! in Cirros 0.5.2 while detaching a volume

 

[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1931702

Title:
  BUG: soft lockup - CPU#0 stuck for 22s! in Cirros 0.5.2 while
  detaching a volume

Status in OpenStack Compute (nova):
  Expired

Bug description:
  Description
  ===========

  test_live_block_migration_with_attached_volume fails during cleanups
  to detach a volume from an instance that has as the test name suggest
  been migrated, we've not got the complete console for some reason but
  the part we have shows the following soft lockup:

  https://933286ee423f4ed9028e-1eceb8a6fb7f917522f65bda64a8589f.ssl.cf5.rackcdn.com/794766/2/check/nova-
  grenade-multinode/a5ff180/

  [   40.741525] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [run-parts:288]
  [   40.745566] Modules linked in: ahci libahci ip_tables x_tables nls_utf8 nls_iso8859_1 nls_ascii isofs hid_generic usbhid hid virtio_rng virtio_gpu drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm virtio_scsi virtio_net net_failover failover virtio_input virtio_blk qemu_fw_cfg 9pnet_virtio 9pnet pcnet32 8139cp mii ne2k_pci 8390 e1000
  [   40.750740] CPU: 0 PID: 288 Comm: run-parts Not tainted 5.3.0-26-generic #28~18.04.1-Ubuntu
  [   40.751458] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.13.0-1ubuntu1.1 04/01/2014
  [   40.753365] RIP: 0010:__switch_to_asm+0x42/0x70
  [   40.754190] Code: 48 8b 9e c8 08 00 00 65 48 89 1c 25 28 00 00 00 49 c7 c4 10 00 00 00 e8 07 00 00 00 f3 90 0f ae e8 eb f9 e8 07 00 00 00 f3 90 <0f> ae e8 eb f9 49 ff cc 75 e3 48 81 c4 00 01 00 00 41 5f 41 5e 41
  [   40.755739] RSP: 0018:ffffb6a9c027bdb8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
  [   40.756419] RAX: 0000000000000018 RBX: ffff97eec71e6000 RCX: 3c434e4753444bff
  [   40.757057] RDX: 0001020304050608 RSI: 8080808080808080 RDI: 0000000000000fe0
  [   40.757659] RBP: ffffb6a9c027bde8 R08: fefefefefefefeff R09: 0000000000000000
  [   40.758268] R10: 0000000000000fc8 R11: 0000000040042000 R12: 00007ffd9666df63
  [   40.758954] R13: 0000000000000000 R14: 0000000000000001 R15: 00000000000007ff
  [   40.759654] FS:  00007f55b7e936a0(0000) GS:ffff97eec7600000(0000) knlGS:0000000000000000
  [   40.760334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   40.760830] CR2: 00000000006ad340 CR3: 0000000003cc8000 CR4: 00000000000006f0
  [   40.761685] Call Trace:
  [   40.762767]  ? __switch_to_asm+0x34/0x70
  [   40.763183]  ? __switch_to_asm+0x40/0x70
  [   40.763539]  ? __switch_to_asm+0x34/0x70
  [   40.763895]  ? __switch_to_asm+0x40/0x70
  [   40.764249]  ? __switch_to_asm+0x34/0x70
  [   40.764597]  ? __switch_to_asm+0x40/0x70
  [   40.764945]  ? __switch_to_asm+0x34/0x70
  [   40.765311]  __switch_to_asm+0x40/0x70
  [   40.765884]  ? __switch_to_asm+0x34/0x70
  [   40.766239]  ? __switch_to_asm+0x40/0x70
  [   40.766619]  ? __switch_to_asm+0x34/0x70
  [   40.766972]  ? __switch_to_asm+0x40/0x70
  [   40.767323]  ? __switch_to_asm+0x34/0x70
  [   40.767677]  ? __switch_to_asm+0x40/0x70
  [   40.768024]  ? __switch_to_asm+0x34/0x70
  [   40.768375]  ? __switch_to_asm+0x40/0x70
  [   40.768725]  ? __switch_to_asm+0x34/0x70
  [   40.769516]  ? __switch_to+0x112/0x480
  [   40.769864]  ? __switch_to_asm+0x40/0x70
  [   40.770218]  ? __switch_to_asm+0x34/0x70
  [   40.771035]  ? __schedule+0x2b0/0x670
  [   40.771919]  ? schedule+0x33/0xa0
  [   40.772741]  ? prepare_exit_to_usermode+0x98/0xa0
  [   40.773398]  ? retint_user+0x8/0x8

  I'm going to see if I can instrument the test a little more to dump
  the console *after* the detach request so we get a better idea of what
  if anything went wrong in the guestOS.

  Steps to reproduce
  ==================

  nova-grenade-multinode and nova-live-migration have been hit this thus
  far.

  Expected result
  ===============

  test_live_block_migration_with_attached_volume passes.

  Actual result
  =============

  test_live_block_migration_with_attached_volume fails.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
    list for all releases: http://docs.openstack.org/releases/

     Master.

  2. Which hypervisor did you use?
     (For example: Libvirt + KVM, Libvirt + XEN, Hyper-V, PowerKVM, ...)
     What's the version of that?

     libvirt + KVM

  2. Which storage type did you use?
     (For example: Ceph, LVM, GPFS, ...)
     What's the version of that?

     N/A

  3. Which networking type did you use?
     (For example: nova-network, Neutron with OpenVSwitch, ...)

     N/A

  Logs & Configs
  ==============

  See above.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1931702/+subscriptions



References