group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #15404
[Bug 1668129] Re: Amazon I3 Instance Buffer I/O error on dev nvme0n1
> This bug is still present on 14.04 using linux-generic-lts-xenial
kernel 4.4.0-87-generic.
that's correct, and there is no planned change for the standard kernel.
Only the linux-aws kernel is being changed to address this issue, by
disabling Xen memory ballooning, as described in comment 50.
A bit more detail on the issue:
1. AWS Xen hypervisor boots linux and provides e820 map, and Xen balloon target.
2. Ubuntu kernel boots and sets up all memory listed in the e820 map.
3. Xen balloon driver notices total memory doesn't quite match its target, and so requests some pages from Xen hypervisor.
4. AWS Xen hypervisor allows Ubuntu kernel balloon driver to have exactly 11 more pages, which are registered with the Ubuntu kernel as hotplugged memory (hypervisor rejects requests for any more balloon pages).
5. The new balloon hotplugged pages are enabled (via udev or kernel config or sysfs), which makes them available for general use
6. If any NVMe I/O operation uses any of those 11 balloon pages for DMA, the hypervisor sees that the page physical address is outside its e820 map address range (because it was a hotplugged page) and fails the NVMe I/O.
The problem here lies either in #4 or #6 above, meaning that the
hypervisor either should reject all requests for additional hotplugged
memory pages (step 4) or it should allow DMA using hotplugged memory
pages (step 6). Any change to the Ubuntu kernel is only working around
this hypervisor problem by not enabling any hotplugged pages.
AWS is well aware of this and is investigating what changes can be made
to their hypervisor, but I am not part of those discussions and so I
can't provide any more detail on if/when AWS might fix either #4 and/or
#6. I will note that the Amazon Linux kernel has Xen ballooning
disabled, and I believe the RHEL kernel does as well, so they have both
only worked around this issue.
Until the AWS hypervisor is changed, there are various options to work
around the issue:
Trusty:
The trusty 14.04 release does have Xen ballooning enabled, and it does hotplug memory, however the udev rules do not enable the hotplugged memory, so this issue does not exist in trusty (unless the hotplugged memory is manually enabled).
Xenial with 4.4 kernel:
The standard 4.4 kernel in Xenial does have Xen ballooning enabled, because it may be desired under non-AWS Xen hypervisors. The recommended way to work around the issue is to edit the 40-vm-hotadd.rules as described in comment 29.
Xenial with HWE kernel, or Zesty:
Starting with the 4.8 kernel, hotplug memory is automatically onlined, so in addition to editing the udev rule as described above (in Xenial with 4.4 kernel), you also must add a kernel boot param as described in comment 44.
Xenial linux-aws:
The linux-aws kernel has Xen ballooning disabled in the kernel configuration, so it will not cause any memory to be hotplugged, thus avoiding the problem; no other workaround is required when using the linux-aws kernel.
I am marking this as "wont fix" for the standard Xenial kernel.
** Changed in: linux (Ubuntu Xenial)
Status: Triaged => Won't Fix
** Changed in: linux (Ubuntu)
Status: Triaged => Won't Fix
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1668129
Title:
Amazon I3 Instance Buffer I/O error on dev nvme0n1
Status in linux package in Ubuntu:
Won't Fix
Status in linux-aws package in Ubuntu:
Fix Committed
Status in linux source package in Xenial:
Won't Fix
Status in linux-aws source package in Xenial:
Fix Committed
Bug description:
On the AWS i3 instance class - when putting the new NVME storage disks
under high IO load - seeing data corruption and errors in dmesg
[ 662.884390] blk_update_request: I/O error, dev nvme0n1, sector 120063912
[ 662.887824] Buffer I/O error on dev nvme0n1, logical block 14971093, lost async page write
[ 662.891254] Buffer I/O error on dev nvme0n1, logical block 14971094, lost async page write
[ 662.895591] Buffer I/O error on dev nvme0n1, logical block 14971095, lost async page write
[ 662.899873] Buffer I/O error on dev nvme0n1, logical block 14971096, lost async page write
[ 662.904179] Buffer I/O error on dev nvme0n1, logical block 14971097, lost async page write
[ 662.908458] Buffer I/O error on dev nvme0n1, logical block 14971098, lost async page write
[ 662.912287] Buffer I/O error on dev nvme0n1, logical block 14971099, lost async page write
[ 662.916047] Buffer I/O error on dev nvme0n1, logical block 14971100, lost async page write
[ 662.920285] Buffer I/O error on dev nvme0n1, logical block 14971101, lost async page write
[ 662.924565] Buffer I/O error on dev nvme0n1, logical block 14971102, lost async page write
[ 663.645530] blk_update_request: I/O error, dev nvme0n1, sector 120756912
<snip>
[ 1012.752265] blk_update_request: I/O error, dev nvme0n1, sector 3744
[ 1012.755396] buffer_io_error: 194552 callbacks suppressed
[ 1012.755398] Buffer I/O error on dev nvme0n1, logical block 20, lost async page write
[ 1012.759248] Buffer I/O error on dev nvme0n1, logical block 21, lost async page write
[ 1012.763368] Buffer I/O error on dev nvme0n1, logical block 22, lost async page write
[ 1012.767271] Buffer I/O error on dev nvme0n1, logical block 23, lost async page write
[ 1012.771314] Buffer I/O error on dev nvme0n1, logical block 24, lost async page write
Able to replicate this with a bonnie++ stress test.
bonnie++ -d /mnt/test/ -r 1000
Linux i-0d76e144d85f487cf 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Feb 27 02:12 seq
crw-rw---- 1 root audio 116, 33 Feb 27 02:12 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
DistroRelease: Ubuntu 16.04
Ec2AMI: ami-bc62b2aa
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1d
Ec2InstanceType: i3.2xlarge
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
IwConfig: Error: [Errno 2] No such file or directory
JournalErrors:
Error: command ['journalctl', '-b', '--priority=warning', '--lines=1000'] failed with exit code 1: Hint: You are currently not seeing messages from other users and the system.
Users in the 'systemd-journal' group can see all messages. Pass -q to
turn off this notice.
No journal files were opened due to insufficient permissions.
Lsusb: Error: command ['lsusb'] failed with exit code 1:
MachineType: Xen HVM domU
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
TERM=screen-256color
PATH=(custom, no user)
XDG_RUNTIME_DIR=<set>
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-64-generic root=UUID=cfda0544-9803-41e7-badb-43563085ff3a ro console=tty1 console=ttyS0
ProcVersionSignature: Ubuntu 4.4.0-64.85-generic 4.4.44
RelatedPackageVersions:
linux-restricted-modules-4.4.0-64-generic N/A
linux-backports-modules-4.4.0-64-generic N/A
linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: xenial ec2-images
Uname: Linux 4.4.0-64-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
WifiSyslog:
_MarkForUpload: True
dmi.bios.date: 12/12/2016
dmi.bios.vendor: Xen
dmi.bios.version: 4.2.amazon
dmi.chassis.type: 1
dmi.chassis.vendor: Xen
dmi.modalias: dmi:bvnXen:bvr4.2.amazon:bd12/12/2016:svnXen:pnHVMdomU:pvr4.2.amazon:cvnXen:ct1:cvr:
dmi.product.name: HVM domU
dmi.product.version: 4.2.amazon
dmi.sys.vendor: Xen
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129/+subscriptions