← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1651602] Re: [2.1.1] MAAS has nvme0n1 set as boot disk, curtin fails

 

After further troubleshooting with cgregan, we've further narrowed this
down.

We ran the following script on the node that was having trouble:

https://gist.github.com/pontillo/0b92a7da2fba43fb5dce705be2dcf38b

Unlike all the other devices MAAS works with, the Intel NVMe device
reports a serial number that cannot be found anywhere in /dev/disk/by-
id/*. When curtin is supplied a serial number, it uses a heuristic to
find the device as follows:

http://bazaar.launchpad.net/~curtin-
dev/curtin/trunk/view/435/curtin/commands/block_meta.py#L270

http://bazaar.launchpad.net/~curtin-
dev/curtin/trunk/view/435/curtin/block/__init__.py#L601

So arguably, this is a bug in the Intel NVMe serial number; the way it
populates /dev/disk/* leaves much to be desired.

This is *arguably* a bug in curtin (and maybe MAAS, since we knowingly
use the serial number even though `udevadm` can tell us that the serial
cannot be found anywhere in /dev/disk/by-id/*), in that we could do a
better job dealing with devices backed by not-so-robust kernel drivers.
But I think we shouldn't encourage bad behavior on the part of driver
writers, so I'm on the fence about whether or not we should fix it.

But mostly, I would argue that this is a bug in the Intel NVMe driver.
The way they expose the device to userland is non-standard and arguably
broken. When we ran `udevadm info -q all -n nvme0n1` on the device, we
got the following pseudo-output:

nvme0n1:
P: /devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
N: nvme0n1
S: SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
S: disk/by-id/nvme-INTEL
E: DEVLINKS=/dev/disk/by-id/nvme-INTEL /dev/SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: DEVNAME=/dev/nvme0n1
E: DEVPATH=/devices/pci0000:00/0000:00:xx.0/0000:xx:00.0/nvme/nvme0/nvme0n1
E: DEVTYPE=disk
E: ID_SERIAL=INTEL SSDxxxxxxxxxx_CVMDxxxxxxxxxxxxxx
E: ID_SERIAL_SHORT=CVMDxxxxxxxxxxxxxx
E: MAJOR=259
E: MINOR=0
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=xxxxxxx

You can see by the lines that start with "S:" and the "DEVLINKS=" line
that the way this device is exposed is very non-standard. One would
expect /dev/disk/by-id/* to contain a DEVLINK containing the serial
number. Instead they expose a 'nvme-INTEL' link, which is (IMHO) a
critical bug, because anyone expecting the things in /dev/disk/by-id/*
to be unique will be in for a big surprise when they add a second NVMe
device to a machine.

** Also affects: curtin
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu)
       Status: Invalid => New

** Changed in: linux (Ubuntu Xenial)
       Status: Fix Committed => New

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1651602

Title:
  Intel NVMe driver does not expose consistent links in /dev/disk/by-id

Status in curtin:
  New
Status in MAAS:
  Won't Fix
Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Xenial:
  Incomplete

Bug description:
  MAAS Version 2.1.1+bzr5544-0ubuntu1 (16.10.1)
  Deploying Xenial Nodes

  1) Deploy MAAS 2.1.1 on Yakkety
  2) Associate Juju 2.1 beta3
  3) Juju deploy Kubernetes Core

  Nodes begin to deploy but fail

  Installation failed with exception: Unexpected error while running command.
  Command: ['curtin', 'block-meta', 'custom']
  Exit code: 3
  Reason: -
  Stdout: b"no disk with serial 'CVMD434500BN400AGN' found\n"

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1651602/+subscriptions