← Back to team overview

kernel-packages team mailing list archive

[Bug 1512185] Re: qemu-nbd on ARM64 deadlock? Stuck in rt_sigtimedwait([BUS ALRM IO], ..) and futex(0x7f749ec230, FUTEX_WAIT, ...)

 

Kernel OOPS on one of the mcdivitts:

| [544599.231964] block nbd0: Attempted send on closed socket
| [544599.231968] block nbd0: Attempted send on closed socket
| [544599.231972] block nbd0: Attempted send on closed socket
| [544599.231975] block nbd0: Attempted send on closed socket
| [544627.046717] INFO: rcu_sched self-detected stall on CPU { 3}  (t=15040176 jiffies g=8751031 c=8751030 q=246988591)
| [544627.046719] Task dump for CPU 3:
| [544627.046723] qemu-nbd        R  running task        0 32375      1 0x0000000a
| [544627.046724] Call trace:
| [544627.046733] [<ffffffc00008acf4>] dump_backtrace+0x0/0x170
| [544627.046737] [<ffffffc00008ae84>] show_stack+0x20/0x2c
| [544627.046740] [<ffffffc0000e066c>] sched_show_task+0xa0/0xf8
| [544627.046742] [<ffffffc0000e3c24>] dump_cpu_task+0x44/0x54
| [544627.046745] [<ffffffc00010bb90>] rcu_dump_cpu_stacks+0x98/0xec
| [544627.046747] [<ffffffc00010f350>] rcu_check_callbacks+0x410/0x740
| [544627.046751] [<ffffffc000114974>] update_process_times+0x40/0x74
| [544627.046754] [<ffffffc0001246d8>] tick_sched_handle.isra.15+0x38/0x7c
| [544627.046756] [<ffffffc000124764>] tick_sched_timer+0x48/0x84
| [544627.046758] [<ffffffc000114ff4>] __run_hrtimer+0x90/0x1d0
| [544627.046760] [<ffffffc000115ae4>] hrtimer_interrupt+0xec/0x28c
| [544627.046764] [<ffffffc00063f4b4>] arch_timer_handler_phys+0x38/0x48
| [544627.046766] [<ffffffc000105554>] handle_percpu_devid_irq+0x90/0x12c
| [544627.046769] [<ffffffc0001010c0>] generic_handle_irq+0x38/0x54
| [544627.046770] [<ffffffc000101404>] __handle_domain_irq+0x64/0xc0
| [544627.046772] [<ffffffc000082478>] gic_handle_irq+0x38/0x88
| [544627.046773] Exception stack(0xffffffc634947610 to 0xffffffc634947730)
| [544627.046776] 7600:                                     00c1a000 ffffffc0 00c1f000 ffffffc0
| [544627.046778] 7620: 34947750 ffffffc6 000ffe0c ffffffc0 00000900 00000000 000001c0 00000000
| [544627.046780] 7640: 00000005 00000000 004a1870 ffffffc0 004a1870 ffffffc0 004a433c ffffffc0
| [544627.046782] 7660: 000000ff 00000000 00b5c658 ffffffc0 6465736f 636f7320 00ba157e 00000000
| [544627.046784] 7680: 00ba1449 00000000 00000000 00000000 00000006 00000000 66666666 20666366
| [544627.046786] 76a0: 34393433 30353737 ec9de100 0038d0cb 0023ceec ffffffc0 004bc7f8 00000000
| [544627.046788] 76c0: aff5f5d0 0000007f 00c1a000 ffffffc0 00c1f000 ffffffc0 00000140 00000000
| [544627.046790] 76e0: 00c1adc0 ffffffc0 00b7c198 ffffffc0 00000001 00000000 00c1f1c0 ffffffc0
| [544627.046792] 7700: 00000003 00000000 00000000 00000000 fff47938 ffffffcf 34947750 ffffffc6
| [544627.046793] 7720: 000ffe08 ffffffc0 34947750 ffffffc6
| [544627.046795] [<ffffffc000085da4>] el1_irq+0x64/0xc0
| [544627.046798] [<ffffffc000100170>] vprintk_emit+0x33c/0x59c
| [544627.046801] [<ffffffc0004c1c4c>] dev_vprintk_emit+0xc8/0x204
| [544627.046802] [<ffffffc0004c1dfc>] dev_printk_emit+0x74/0x84
| [544627.046804] [<ffffffc0004c1e60>] __dev_printk+0x54/0x9c
| [544627.046805] [<ffffffc0004c2114>] dev_err+0x70/0x80
| [544627.046814] [<ffffffbffc3c8544>] __nbd_ioctl+0x810/0x944 [nbd]
| [544627.046817] [<ffffffbffc3c86f4>] nbd_ioctl+0x7c/0x228 [nbd]
| [544627.046821] [<ffffffc0003b2fe8>] blkdev_ioctl+0x1f4/0x770
| [544627.046825] [<ffffffc000245108>] block_ioctl+0x50/0x64
| [544627.046829] [<ffffffc00021e094>] do_vfs_ioctl+0x330/0x5d4
| [544627.046831] [<ffffffc00021e3c4>] SyS_ioctl+0x8c/0xa4

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1512185

Title:
  qemu-nbd on ARM64 deadlock? Stuck in rt_sigtimedwait([BUS ALRM IO],
  ..) and futex(0x7f749ec230, FUTEX_WAIT, ...)

Status in linux package in Ubuntu:
  Incomplete
Status in linux-meta-lts-vivid package in Ubuntu:
  New
Status in qemu package in Ubuntu:
  New

Bug description:
  Hi,

  We're seeing this often on our HP Moonshot ARM64 nova-compute nodes
  where qemu-nbd processes would lock up. At the same time, there's also
  a bunch of kernel spew as follows:

  | [605282.018238] block nbd3: Attempted send on closed socket
  | [605282.018242] block nbd3: Attempted send on closed socket
  | [605282.018245] block nbd3: Attempted send on closed socket
  | [605282.018249] block nbd3: Attempted send on closed socket

  swirlix01:

  | hloeung@swirlix01:~$ uname -a
  | Linux swirlix01 3.19.0-30-generic #34~14.04.1-Ubuntu SMP Fri Oct 2 22:15:46 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
  | hloeung@swirlix01:~$ ps afx | grep qe\\mu-nbd
  | 27782 ?        Ssl    0:00 /usr/bin/qemu-nbd -c /dev/nbd10 /var/lib/nova/instances/ba50751e-56d7-4bc4-8742-1193fe7a138e/disk
  | hloeung@swirlix01:~$ sudo cat /proc/$(ps afx | grep qe\\mu-nbd | awk '{ print $1 }')/stack
  | [<ffffffc0000875b0>] __switch_to+0x74/0x8c
  | [<ffffffc000125dac>] futex_wait_queue_me+0xf4/0x184
  | [<ffffffc0001268b4>] futex_wait+0x154/0x24c
  | [<ffffffc000128638>] do_futex+0x1a0/0x9ec
  | [<ffffffc000128f1c>] SyS_futex+0x98/0x1cc
  | [<ffffffc00008642c>] el0_svc_naked+0x20/0x28
  | [<ffffffffffffffff>] 0xffffffffffffffff

  swirlix08:

  | hloeung@swirlix08:~$ uname -a
  | Linux swirlix08 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:50:10 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
  | hloeung@swirlix08:~$ ps afx | grep qe\\mu-nbd
  | 31976 ?        Ssl    0:00 /usr/bin/qemu-nbd -c /dev/nbd6 /var/lib/nova/instances/92ceb061-2ea4-4212-be20-ab0ded6eb3cd/disk
  | hloeung@swirlix08:~$ sudo cat /proc/$(ps afx | grep qe\\mu-nbd | awk '{ print $1 }')/stack
  | [<ffffffc0000875b0>] __switch_to+0x74/0x8c
  | [<ffffffc000125d6c>] futex_wait_queue_me+0xf4/0x184
  | [<ffffffc000126874>] futex_wait+0x154/0x24c
  | [<ffffffc0001285f8>] do_futex+0x1a0/0x9ec
  | [<ffffffc000128edc>] SyS_futex+0x98/0x1cc
  | [<ffffffc00008642c>] el0_svc_naked+0x20/0x28
  | [<ffffffffffffffff>] 0xffffffffffffffff

  swirlix11:

  | hloeung@swirlix11:~$ uname -a
  | Linux swirlix11 3.19.0-31-generic #36~14.04.1-Ubuntu SMP Thu Oct 8 10:50:10 UTC 2015 aarch64 aarch64 aarch64 GNU/Linux
  | hloeung@swirlix11:~$ ps afx | grep qe\\mu-nbd
  | 18149 ?        Ssl    0:00 /usr/bin/qemu-nbd -c /dev/nbd3 /var/lib/nova/instances/84cac137-c1e4-46ac-894a-efcd55ef7e05/disk
  | hloeung@swirlix11:~$ sudo cat /proc/$(ps afx | grep qe\\mu-nbd | awk '{ print $1 }'/stack
  | hloeung@swirlix11:~$ sudo cat /proc/$(ps afx | grep qe\\mu-nbd | awk '{ print $1 }')/stack
  | [<ffffffc0000875b0>] __switch_to+0x74/0x8c
  | [<ffffffc000125d6c>] futex_wait_queue_me+0xf4/0x184
  | [<ffffffc000126874>] futex_wait+0x154/0x24c
  | [<ffffffc0001285f8>] do_futex+0x1a0/0x9ec
  | [<ffffffc000128edc>] SyS_futex+0x98/0x1cc
  | [<ffffffc00008642c>] el0_svc_naked+0x20/0x28
  | [<ffffffffffffffff>] 0xffffffffffffffff

  | hloeung@swirlix11:~$ sudo strace -f -p 18149
  | Process 18149 attached with 3 threads
  | [pid 18150] rt_sigtimedwait([BUS ALRM IO], NULL, NULL, 8 <unfinished ...>
  | [pid 18149] futex(0x7f749ec230, FUTEX_WAIT, 18152, NULL
  | ... (hangs here) ...

  We're using the QEMU package backported from Vivid as per LP:1457639

  | hloeung@swirlix11:~$ apt-cache policy qemu-utils
  | qemu-utils:
  |   Installed: 1:2.2+dfsg-5expubuntu9.5+bug1457639~ubuntu14.04.1
  |   Candidate: 1:2.2+dfsg-5expubuntu9.5+bug1457639~ubuntu14.04.1
  |   Version table:
  |  *** 1:2.2+dfsg-5expubuntu9.5+bug1457639~ubuntu14.04.1 0
  |         500 http://ppa.launchpad.net/canonical-is-sa/arm64-infra-workarounds/ubuntu/ trusty/main arm64 Packages

  I'm also not sure if this is related to LP:1505564, which is for amd64/x86_64.
  --- 
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Oct 25 17:42 seq
   crw-rw---- 1 root audio 116, 33 Oct 25 17:42 timer
  AplayDevices: Error: [Errno 2] No such file or directory
  ApportVersion: 2.14.1-0ubuntu3.18
  Architecture: arm64
  ArecordDevices: Error: [Errno 2] No such file or directory
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  CRDA: Error: [Errno 2] No such file or directory
  DistroRelease: Ubuntu 14.04
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  Package: qemu (not installed)
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_GB
   SHELL=/bin/bash
  ProcFB:
   
  ProcKernelCmdLine: console=ttyS0,9600n8r ro
  ProcVersionSignature: Ubuntu 3.19.0-31.36~14.04.1-generic 3.19.8-ckt7
  RfKill: Error: [Errno 2] No such file or directory
  Tags:  trusty uec-images trusty uec-images
  Uname: Linux 3.19.0-31-generic aarch64
  UnreportableReason: This is not an official Ubuntu package. Please remove any third party package and try again.
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm
  _MarkForUpload: True

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1512185/+subscriptions


References