← Back to team overview

kernel-packages team mailing list archive

[Bug 1534345] Re: Ubuntu 15.10 Crashing Frequently on EC2 Instances w/ Enhanced Networking

 

Thanks for looking into this Stefan! We were completely fine with 15.05
and 3.19. If it won't break anything terribly, I can try to put 3.19,
4.0, and 4.1 on these machines, but each one crashes every 24-48 hours,
so it might take me several days. Which kernel would you recommend
starting with, say, 4.0 or 4.4?

Another thing that I didn't find relevant before, but seems to confirm
what you're saying about the per-CPU timers--AWS told me the following
after a crash where I disabled my auto-reboot-on-alarm triggers:

I was able to successfully get a trace - most of the vCPU were just in a
halted state, so nothing there, but one had some potentially useful
information:

++++++++++++++++++++
VCPU 1
rip: ffffffff810c3ef5 __pv_queued_spin_lock_slowpath+0xc5
flags: 00000206 i nz p
rsp: ffff8803ff243e78
rax: 0000000000000a2a	rcx: 00000000fffffffa	rdx: 0000000000000003
rbx: ffff8803f7ef2e38	rsi: ffff8803ff243df8	rdi: 0000000000000003
rbp: ffff8803ff243ea8	 r8: 0000000000000000	 r9: ffff8803fe800000
r10: 0000000000000000	r11: ffffffff813ef2b0	r12: ffff8803ff2571c0
r13: 0000000000080000	r14: ffff88040ffa30c0	r15: 0000000000000001
cs: 0010	 ss: 0000	 ds: 0000	 es: 0000
fs: 0000 @ 00007fc1867b8700
gs: 0000 @ ffff8803ff240000/0000000000000000

cr0: 80050033
cr2: 000000a8
cr3: de15f000
cr4: 001406e0

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810c3ef5)
41 bf 01 00 00 00 48 0f af c3 48 89 45 d0 b8 00 80 00 00 eb 0b <f3> 90 83 e8 01 0f 84 d4 00 00 00

Stack:
8c2fa8473f0f2e38 ffff8803ff2577c0 ffff8803f7ef2e10 0000000000000000
ffff8803f7ef2e10 0000000101155691 ffff8803ff243eb8 ffffffff817f0021
ffff8803ff243f38 ffffffff816e48f4 0000000101155693 000000400000012c
0000000000000024 ffff8803ff243ee0 ffff8803ff243ee0 ffff8803ff243ef0

Call Trace:
  [<ffffffff810c3ef5>] __pv_queued_spin_lock_slowpath+0xc5  <--
  [<ffffffff817f0021>] _raw_spin_lock+0x21
  [<ffffffff816e48f4>] net_rx_action+0xe4
  [<ffffffff8107f846>] __do_softirq+0xf6
  [<ffffffff817f1ddc>] do_softirq_own_stack+0x1c
++++++++++++++++++++

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1534345

Title:
  Ubuntu 15.10 Crashing Frequently on EC2 Instances w/ Enhanced
  Networking

Status in linux package in Ubuntu:
  Triaged

Bug description:
  Lots of details and history of the problem here:
  https://askubuntu.com/questions/710747/after-upgrading-
  to-15-10-from-15-04-ec2-webservers-have-become-very-unstable

  10 of my webservers have started crashing immediately following the
  15.10 upgrade. As far as what exactly defines a "crash", Instance
  Status Checks fail, and I can no longer SSH to the machine. Background
  daemons running on the system stop responding, and nothing is written
  to the logs.

  After weeks of working with the AWS team, I finally fixed a netconsole
  issue via "echo 7 > /proc/sys/kernel/printk" and got netconsole
  working properly, and finally have a trace:

  
  [21410.260077] general protection fault: 0000 [#1] SMP
  [21410.261976] Modules linked in: isofs xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables ppdev intel_rapl iosf_mbi xen_fbfront fb_sys_fops input_leds serio_raw i2c_piix4 parport_pc 8250_fintek parport mac_hid netconsole configfs autofs4 crct10dif_pclmul crc32_pclmul cirrus syscopyarea sysfillrect sysimgblt aesni_intel ttm aes_x86_64 drm_kms_helper lrw gf128mul glue_helper ablk_helper cryptd psmouse drm ixgbevf pata_acpi floppy
  [21410.264054] CPU: 0 PID: 26957 Comm: apache2 Not tainted 4.2.0-23-generic #28-Ubuntu
  [21410.264054] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/07/2015
  [21410.264054] task: ffff8803f9809b80 ti: ffff8803f999c000 task.ti: ffff8803f999c000
  [21410.264054] RIP: 0010:[<ffffffff810e5c36>]  [<ffffffff810e5c36>] run_timer_softirq+0x116/0x2d0
  [21410.264054] RSP: 0000:ffff8803ff203e98  EFLAGS: 00010086
  [21410.264054] RAX: dead000000200200 RBX: ffff8803ff20e9c0 RCX: ffff8803ff203ec8
  [21410.264054] RDX: ffff8803ff203ec8 RSI: 0000000000011fc0 RDI: ffff8803ff20e9c0
  [21410.264054] RBP: ffff8803ff203f08 R08: 000000000000a77a R09: 0000000000000000
  [21410.264054] R10: 0000000000000020 R11: 0000000000000004 R12: 000000000000007c
  [21410.264054] R13: ffffffff8172aaf0 R14: 0000000000000000 R15: ffff8803af955be0
  [21410.264054] FS:  00007fb0ce6e8780(0000) GS:ffff8803ff200000(0000) knlGS:0000000000000000
  [21410.264054] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [21410.264054] CR2: 00007fb0ce51e130 CR3: 00000003fb233000 CR4: 00000000001406f0
  [21410.264054] Stack:
  [21410.264054]  ffff8803ff203eb8 ffff8803ff20f5f8 ffff8803ff20f3f8 ffff8803ff20f1f8
  [21410.264054]  ffff8803ff20e9f8 ffff8803af955b58 dead000000200200 00000000f60fabc0
  [21410.264054]  0000000000011fc0 0000000000000001 ffffffff81c0b0c8 0000000000000001
  [21410.264054] Call Trace:
  [21410.264054]  <IRQ>
  [21410.264054]  [<ffffffff8107f846>] __do_softirq+0xf6/0x250
  [21410.264054]  [<ffffffff8107fb13>] irq_exit+0xa3/0xb0
  [21410.264054]  [<ffffffff814a4499>] xen_evtchn_do_upcall+0x39/0x50
  [21410.264054]  [<ffffffff817f1f6b>] xen_hvm_callback_vector+0x6b/0x70
  [21410.264054]  <EOI>
  [21410.264054] Code: 81 e6 00 00 20 00 48 85 d2 48 89 45 b8 0f 85 30 01 00 00 4c 89 7b 08 0f 1f 44 00 00 49 8b 07 49 8b 57 08 48 85 c0 48 89 02 74 04 <48> 89 50 08 41 f6 47 2a 10 48 b8 00 02 20 00 00 00 ad de 49 c7
  [21410.264054] RIP  [<ffffffff810e5c36>] run_timer_softirq+0x116/0x2d0
  [21410.264054]  RSP <ffff8803ff203e98>

  I don't have a vmcore at the moment, but I'm trying to get one from
  AWS and should have one in the next couple of days. This is happening
  frequently and repeatedly since I first upgraded to 15.10 on early
  December.

  
  ubuntu@xxx-web-xx:~$ lsb_release -a
  No LSB modules are available.
  Distributor ID:	Ubuntu
  Description:	Ubuntu 15.10
  Release:	15.10
  Codename:	wily
  ubuntu@xxx-web-xx:~$ uname -a
  Linux xxx-web-xx 4.2.0-23-generic #28-Ubuntu SMP Sun Dec 27 17:47:31 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
  ubuntu@xxx-web-xx:~$

  ProblemType: Bug
  DistroRelease: Ubuntu 15.10
  Package: linux-image-4.2.0-23-generic 4.2.0-23.28
  ProcVersionSignature: User Name 4.2.0-23.28-generic 4.2.6
  Uname: Linux 4.2.0-23-generic x86_64
  AlsaDevices:
   total 0
   crw-rw---- 1 root audio 116,  1 Jan 14 15:42 seq
   crw-rw---- 1 root audio 116, 33 Jan 14 15:42 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.19.1-0ubuntu5
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
  Date: Thu Jan 14 21:31:14 2016
  Ec2AMI: ami-d5e7adbf
  Ec2AMIManifest: (unknown)
  Ec2AvailabilityZone: us-east-1d
  Ec2InstanceType: m4.xlarge
  Ec2Kernel: unavailable
  Ec2Ramdisk: unavailable
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
  MachineType: Xen HVM domU
  PciMultimedia:
   
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   XDG_RUNTIME_DIR=<set>
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB:
   0 cirrusdrmfb
   1 xen
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.2.0-23-generic root=UUID=9bd55602-81dd-4868-8cfc-b7d63f8f8d7e ro console=tty1 console=ttyS0 crashkernel=256M@0M
  RelatedPackageVersions:
   linux-restricted-modules-4.2.0-23-generic N/A
   linux-backports-modules-4.2.0-23-generic  N/A
   linux-firmware                            1.149.3
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
  UpgradeStatus: Upgraded to wily on 2015-12-15 (29 days ago)
  dmi.bios.date: 12/07/2015
  dmi.bios.vendor: Xen
  dmi.bios.version: 4.2.amazon
  dmi.chassis.type: 1
  dmi.chassis.vendor: Xen
  dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd12/07/2015:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
  dmi.product.name: HVM domU
  dmi.product.version: 4.2.amazon
  dmi.sys.vendor: Xen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1534345/+subscriptions


References