← Back to team overview

dx-packages team mailing list archive

[Bug 1903388] [NEW] Failure to write to NVMe disk soon after boot (APST-related)

 

Public bug reported:

Hi all,
This one is similar to #1805816 and #1678184 (one was fixed, other closed).

Symptoms:
  During regular use, system starts failing after 10m - 1hr after start.
  Icons start disappearing, writing to disk fails.
  In-memory operations still work for a while (switching windows, streaming video calls, typing).
  After some time the entire system crashes, with a Black Screen Of Death constantly looping:
    ---------------------------
    EXT4-fs error (device nvme0n1p5) ext4_find_entry:1455: inode #4594258: comm gmain: reading directory lblock 0
    [... same repeats for 8 times on average]
    systemd-journald[439]: Failed to write entry (9 items, 270 bytes), ignoring: Read-only file system
    [... repeats for 10 times on average]
    ---------------------------

Probable causes:
  Updated both kernel and BIOS 2 days ago. Unable to determine, which one caused the change.
  Don't know how to determine which kernel and bios I was running before the update.
  Looks like APST issue, based on info from web and previous bug reports.

Verification:
  Created a rudimentary bash script, writing to a file in a loop, incrementing timeout between two consecutive writes each time.
  Ran script using:
   - different nvme_core.default_ps_max_latency_us settings
     in GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=[0|200|5500]"
   - regular boot mode
   - logged in to account
   - on battery power
  With default_ps_max_latency_us NOT SET:
     writing FAILS between 57 and 70 seconds timeout between writes
  With default_ps_max_latency_us=5500:
     no write failure during 1hrs run
  With default_ps_max_latency_us=200:
     no write failure during 30m run
  With latency 0:
     no write failure during 10m run

This suggests an APST issue.

Machine:
  Lenovo Thinkpad T570
Disk:
  SAMSUNG MZVLB512HAJQ-000L7
  512 GB (512110190592 bytes)
  Firmware: 3L2QEXA7
  Serial#: S3TNNE0K119126
System:
  OS 1: Ubuntu 18.04.5 LTS
        Kernel: 4.15.0-122-generic #124-Ubuntu SMP Thu Oct 15 13:03:05 UTC 2020 x86_64
  OS 2: Windows 7 (on a separate partition on same disk, dualbooted with grub).

Actions taken:
  Successfully checked the partitions for errors by running "Check partition" and "Repair partition" in Disks utility in Ubuntu, running from a bootable USB.
  Starting in "recovery mode" yields an error (among other suspicious behavior):
     --------------------
     sd 0:0:0:0: Attached scsi generic sg0 type 0
     sd 0:0:0:0: [sda] Attached SCSI removable Disk
      input: TPPS/2 IBM TrackPoint as /devices/platform/18042/serio1/serio2/input/input ....
     nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
     nvme 0000:40:00.0: enabling device (0000 -> 0002)
     nvme nvme0: Removing after probe failure status: -19
     nvme0n1: detected capacity change from 512110190592 to 0
     print_req_error: I/O error, dev nvme0n1, sector 1000215040
     nvme nvme0: failed to set APST feature (-19)
     Waiting for suspend/resume device ... Begin: Running /scripts/local-block
     No devices listed in conf file were found.
     No devices listed in conf file were found.
     [repeats]
     --------------------
  Without nvme_core.default_ps_max_latency_us set: writing fails between cca. 57 - 70, on battery.
  With nvme_core.default_ps_max_latency_us=5500: FIXES THE PROBLEM.
  With nvme_core.default_ps_max_latency_us=200: FIXES THE PROBLEM.
  With nvme_core.default_ps_max_latency_us=0: FIXES THE PROBLEM.

Previous behavior on same machine:
 Same OS, with a previous Kernel has been running perfectly fine for the last year, "on high revs" (it's a development machine).
 Often running on battery alone.
 sleep and wakeup without issues.
 It is running Windows 7 (dual booted) without issues.

Misc info:
To test different settings, did 'sudo nano /etc/default/grub' with each of these settings:
  GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
  # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"
  # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=200"
  # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
followed by 'sudo update-grub' and reboot.

Related bugs (links):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816
--- 
ProblemType: Bug
ApportVersion: 2.20.9-0ubuntu7.17
Architecture: amd64
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC0:  vanjad     2707 F.... pulseaudio
CurrentDesktop: ubuntu:GNOME
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=23ec2501-e6b9-41c8-84ae-7098b3721cc9
InstallationDate: Installed on 2018-05-30 (892 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: LENOVO 20H9004HSC
NonfreeKernelModules: nvidia_modeset nvidia
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-122-generic root=UUID=d28c5695-65bc-4c81-ac20-0a8291f03147 ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1
ProcVersionSignature: Ubuntu 4.15.0-122.124-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-122-generic N/A
 linux-backports-modules-4.15.0-122-generic  N/A
 linux-firmware                              1.173.19
Tags:  wayland-session bionic
Uname: Linux 4.15.0-122-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip docker libvirt lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 12/03/2019
dmi.bios.vendor: LENOVO
dmi.bios.version: N1VET52W (1.42 )
dmi.board.asset.tag: Not Available
dmi.board.name: 20H9004HSC
dmi.board.vendor: LENOVO
dmi.board.version: SDK0J40697 WIN
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: None
dmi.modalias: dmi:bvnLENOVO:bvrN1VET52W(1.42):bd12/03/2019:svnLENOVO:pn20H9004HSC:pvrThinkPadT570:rvnLENOVO:rn20H9004HSC:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
dmi.product.family: ThinkPad T570
dmi.product.name: 20H9004HSC
dmi.product.version: ThinkPad T570
dmi.sys.vendor: LENOVO

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-collected apst bionic nvme samsung wayland-session

** Attachment added: "output of $ sudo nvme list"
   https://bugs.launchpad.net/bugs/1903388/+attachment/5432134/+files/info-nvme-list.txt

-- 
You received this bug notification because you are a member of DX
Packages, which is subscribed to compiz-plugins-main in Ubuntu.
Matching subscriptions: dx-packages
https://bugs.launchpad.net/bugs/1903388

Title:
  Failure to write to NVMe disk soon after boot (APST-related)

Status in linux package in Ubuntu:
  New

Bug description:
  Hi all,
  This one is similar to #1805816 and #1678184 (one was fixed, other closed).

  Symptoms:
    During regular use, system starts failing after 10m - 1hr after start.
    Icons start disappearing, writing to disk fails.
    In-memory operations still work for a while (switching windows, streaming video calls, typing).
    After some time the entire system crashes, with a Black Screen Of Death constantly looping:
      ---------------------------
      EXT4-fs error (device nvme0n1p5) ext4_find_entry:1455: inode #4594258: comm gmain: reading directory lblock 0
      [... same repeats for 8 times on average]
      systemd-journald[439]: Failed to write entry (9 items, 270 bytes), ignoring: Read-only file system
      [... repeats for 10 times on average]
      ---------------------------

  Probable causes:
    Updated both kernel and BIOS 2 days ago. Unable to determine, which one caused the change.
    Don't know how to determine which kernel and bios I was running before the update.
    Looks like APST issue, based on info from web and previous bug reports.

  Verification:
    Created a rudimentary bash script, writing to a file in a loop, incrementing timeout between two consecutive writes each time.
    Ran script using:
     - different nvme_core.default_ps_max_latency_us settings
       in GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=[0|200|5500]"
     - regular boot mode
     - logged in to account
     - on battery power
    With default_ps_max_latency_us NOT SET:
       writing FAILS between 57 and 70 seconds timeout between writes
    With default_ps_max_latency_us=5500:
       no write failure during 1hrs run
    With default_ps_max_latency_us=200:
       no write failure during 30m run
    With latency 0:
       no write failure during 10m run

  This suggests an APST issue.

  Machine:
    Lenovo Thinkpad T570
  Disk:
    SAMSUNG MZVLB512HAJQ-000L7
    512 GB (512110190592 bytes)
    Firmware: 3L2QEXA7
    Serial#: S3TNNE0K119126
  System:
    OS 1: Ubuntu 18.04.5 LTS
          Kernel: 4.15.0-122-generic #124-Ubuntu SMP Thu Oct 15 13:03:05 UTC 2020 x86_64
    OS 2: Windows 7 (on a separate partition on same disk, dualbooted with grub).

  Actions taken:
    Successfully checked the partitions for errors by running "Check partition" and "Repair partition" in Disks utility in Ubuntu, running from a bootable USB.
    Starting in "recovery mode" yields an error (among other suspicious behavior):
       --------------------
       sd 0:0:0:0: Attached scsi generic sg0 type 0
       sd 0:0:0:0: [sda] Attached SCSI removable Disk
        input: TPPS/2 IBM TrackPoint as /devices/platform/18042/serio1/serio2/input/input ....
       nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
       nvme 0000:40:00.0: enabling device (0000 -> 0002)
       nvme nvme0: Removing after probe failure status: -19
       nvme0n1: detected capacity change from 512110190592 to 0
       print_req_error: I/O error, dev nvme0n1, sector 1000215040
       nvme nvme0: failed to set APST feature (-19)
       Waiting for suspend/resume device ... Begin: Running /scripts/local-block
       No devices listed in conf file were found.
       No devices listed in conf file were found.
       [repeats]
       --------------------
    Without nvme_core.default_ps_max_latency_us set: writing fails between cca. 57 - 70, on battery.
    With nvme_core.default_ps_max_latency_us=5500: FIXES THE PROBLEM.
    With nvme_core.default_ps_max_latency_us=200: FIXES THE PROBLEM.
    With nvme_core.default_ps_max_latency_us=0: FIXES THE PROBLEM.

  Previous behavior on same machine:
   Same OS, with a previous Kernel has been running perfectly fine for the last year, "on high revs" (it's a development machine).
   Often running on battery alone.
   sleep and wakeup without issues.
   It is running Windows 7 (dual booted) without issues.

  Misc info:
  To test different settings, did 'sudo nano /etc/default/grub' with each of these settings:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=5500"
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=200"
    # GRUB_CMDLINE_LINUX_DEFAULT="quiet splash nvme_core.default_ps_max_latency_us=0"
  followed by 'sudo update-grub' and reboot.

  Related bugs (links):
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1805816
  --- 
  ProblemType: Bug
  ApportVersion: 2.20.9-0ubuntu7.17
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  vanjad     2707 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 18.04
  HibernationDevice: RESUME=UUID=23ec2501-e6b9-41c8-84ae-7098b3721cc9
  InstallationDate: Installed on 2018-05-30 (892 days ago)
  InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
  MachineType: LENOVO 20H9004HSC
  NonfreeKernelModules: nvidia_modeset nvidia
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-122-generic root=UUID=d28c5695-65bc-4c81-ac20-0a8291f03147 ro quiet splash nvme_core.default_ps_max_latency_us=0 vt.handoff=1
  ProcVersionSignature: Ubuntu 4.15.0-122.124-generic 4.15.18
  RelatedPackageVersions:
   linux-restricted-modules-4.15.0-122-generic N/A
   linux-backports-modules-4.15.0-122-generic  N/A
   linux-firmware                              1.173.19
  Tags:  wayland-session bionic
  Uname: Linux 4.15.0-122-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: adm cdrom dip docker libvirt lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 12/03/2019
  dmi.bios.vendor: LENOVO
  dmi.bios.version: N1VET52W (1.42 )
  dmi.board.asset.tag: Not Available
  dmi.board.name: 20H9004HSC
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40697 WIN
  dmi.chassis.asset.tag: No Asset Information
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: None
  dmi.modalias: dmi:bvnLENOVO:bvrN1VET52W(1.42):bd12/03/2019:svnLENOVO:pn20H9004HSC:pvrThinkPadT570:rvnLENOVO:rn20H9004HSC:rvrSDK0J40697WIN:cvnLENOVO:ct10:cvrNone:
  dmi.product.family: ThinkPad T570
  dmi.product.name: 20H9004HSC
  dmi.product.version: ThinkPad T570
  dmi.sys.vendor: LENOVO

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1903388/+subscriptions


Follow ups