← Back to team overview

kernel-packages team mailing list archive

[Bug 1506543] Re: openstack instances on arm64 lock up on polling and other situations fairly repeatably

 

The only scenario we've thoroughly tested is 3.13 on 3.19 on mcdivitts
(Moonshot X-Genes). We haven't tested 3.19 in the guest enough to rule
it out, and we need at least 3.19 on the host for guest UEFI support.

There's normally nothing in dmesg during the hang, though I did once see
"Oct 15 09:59:07 dogfood-bos01-arm64-003 kernel: [ 3840.420637] INFO:
task tcpdump:2023 blocked for more than 120 seconds." Once.

I've never seen an instance totally hang, but our buildds more often
than not get stuck in ntpdate:

socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 3
setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
socket(PF_INET6, SOCK_DGRAM, IPPROTO_UDP) = 4
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(4, SOL_IPV6, IPV6_V6ONLY, [1], 4) = 0
fcntl(4, F_SETFL, O_RDONLY|O_NONBLOCK)  = 0
rt_sigaction(SIGALRM, {0x557a3ad8d8, [], 0}, {SIG_DFL, [], 0}, 8) = 0
setitimer(ITIMER_REAL, {it_interval={0, 200000}, it_value={0, 100000}}, NULL) = 0
setpriority(PRIO_PROCESS, 0, 4294967284) = -1 EACCES (Permission denied)
ppoll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}], 2, {60, 0}, NULL, 0

The ppoll never returns despite having a 60s timeout set, and the 100ms
timer never fires either. Other ntpdates also hang there, but general
shell operations continue to work fine.

A non-buildd instance left for 24 hours had apache2 and various other
daemons all stuck in epoll and similar.

I'll try to gather more logs and devise an easier reproducer than "run a
buildd".

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1506543

Title:
  openstack instances on arm64 lock up on polling and other situations
  fairly repeatably

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  While working on deploying arm64 builders in openstack, we found that
  it was pretty easy to wedge them.  Many hang just running ntpdate
  right off the bat.

  William Grant had more direct tests he did, but upgrading from 3.13 to
  3.19 didn't seem to fix things.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1506543/+subscriptions


References