← Back to team overview

curtin-dev team mailing list archive

[Bug 1900695] Re: MAAS fails to deploy HPE DL380 Gen10 when virtual install drive is enabled

 

It looks like the deployment works, whats failing is booting into the
deployed system. There appears to be two bugs here

1. When a deployment occurs Curtin configures the system to boot locally after trying to boot over the network. This doesn't appear to be happening.
2. GRUB isn't able to see any of the local disks.

When GRUB fails to find a local bootloader it falls back on booting the
next configured device. This should be the local system but because
Curtin never configures local boot system firmware is started.

** Also affects: curtin
   Importance: Undecided
       Status: New

** Also affects: grub
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of curtin
developers, which is subscribed to curtin.
https://bugs.launchpad.net/bugs/1900695

Title:
  MAAS fails to deploy HPE DL380 Gen10 when virtual install drive is
  enabled

Status in curtin:
  New
Status in grub:
  New
Status in MAAS:
  New

Bug description:
  # ENVIRONMENT
  MAAS version (SNAP):
  maas 2.8.2-8577-g.a3e674063 8980 2.8/stable canonical✓ -

  MAAS was cleanly installed. KVM POD setup works.

  MAAS status:
  bind9 RUNNING pid 9258, uptime 15:13:02
  dhcpd RUNNING pid 26173, uptime 15:09:30
  dhcpd6 STOPPED Not started
  http RUNNING pid 19526, uptime 15:10:49
  ntp RUNNING pid 27147, uptime 14:02:18
  proxy RUNNING pid 25909, uptime 15:09:33
  rackd RUNNING pid 7219, uptime 15:13:20
  regiond RUNNING pid 7221, uptime 15:13:20
  syslog RUNNING pid 19634, uptime 15:10:48

  Machine:
  HPE DL380 Gen10
  Storage - comissioning output:
  "NAME": "sda", (virtual install drive)
    "MODEL": "LUN 00 Media 0",
    /devices/pci0000:00/0000:00:14.0/usb2/2-3/2-3.1/2-3.1:1.0/host0/target0:0:0/0:0:0:0/block/sda
    "SIZE": "536870912",
  "NAME": "sdb", Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1
    "MODEL": "LOGICAL VOLUME",
    "PATH": "/dev/sdb",
    "DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:0/block/sdb",
    "SIZE": "960163569664", 
  "NAME": "sdc", (HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2)
    "MODEL": "LOGICAL VOLUME",
    "PATH": "/dev/sdc",
    "DEVPATH": "/devices/pci0000:5b/0000:5b:00.0/0000:5c:00.0/host1/target1:1:0/1:1:0:1/block/sdc",
    "SIZE": "480070426624",

  # PROBLEM DESCRIPTION

  MAAS fails to reboot into deployed OS. "Local" menu entry in MAAS provided grub.cfg fails to instruct grub to find the bootloader on the local drives and forces to use fallback to EFI boot order. 
  Root cause 

  0) identify install device:
  2020-10-20T06:56:37+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})

  Grub is configured not to touch NVRAM:
  2020-10-20T06:57:01+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Transferred {'grub2': 'grub2   grub2/update_nvram  boolean false',

  1) MAAS installs grub on the machine:
  2020-10-20T06:57:02+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']
  2020-10-20T06:57:09+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-curthooks/builtin/cmd-curthooks: SUCCESS: Installing packages on target system: ['efibootmgr', 'grub-efi-amd64', 'grub-efi-amd64-signed', 'shim-signed']

  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: start: cmd-install/stage-curthooks/builtin/cmd-curthooks/install-grub: installing grub to target devices
  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: setup grub on target /tmp/tmpxf91lob9/target
  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found primary UEFI ESP: sdb-part1
  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Found UEFI ESP(s) for grub install: ['sdb-part1']
  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb-part1({'device': 'sdb', 'flag': 'boot', 'id': 'sdb-part1', 'name': 'sdb-part1', 'number': 1, 'offset': '4194304B', 'size': '536870912B', 'type': 'partition', 'uuid': '17649a3f-6e9a-445c-a20a-74914d4c5f88', 'wipe': 'superblock'})
  2020-10-20T06:57:43+00:00 cmp3az2cz20300kv8 cloud-init[2459]: get_path_to_storage_volume for volume sdb({'grub_device': True, 'id': 'sdb', 'model': 'LOGICAL VOLUME', 'name': 'sdb', 'ptable': 'gpt', 'serial': '600508b1001cade9268ac61a1c3cee4b', 'type': 'disk', 'wipe': 'superblock'})

  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Applying grub debconf_selections config:
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: {'debconf_selections': {'grub': 'grub-pc grub-efi/install_devices multiselect /dev/disk/by-id/scsi-3600508b1001cade9268ac61a1c3cee4b-part1'}}

  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: installing grub to target=/tmp/tmpxf91lob9/target devices=['/dev/sdb1'] [replace_defaults=None]
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'dpkg', '--print-architecture'] with allowed return codes [0] (capture=True)
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: grub: moved /tmp/tmpxf91lob9/target/etc/default/grub.d/50-cloudimg-settings.cfg out of the way
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: updated /tmp/tmpxf91lob9/target/etc/default/grub to set: GRUB_CMDLINE_LINUX_DEFAULT="console=tty0 console=ttyS0,115200n8 nvme_core.multipath=0"
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Using grub install command: grub-install
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Grub install cmds:
  2020-10-20T06:57:44+00:00 cmp3az2cz20300kv8 cloud-init[2459]: [['efibootmgr', '-v'], ['dpkg-reconfigure', 'grub-efi-amd64'], ['update-grub'], ['grub-install', '--target=x86_64-efi', '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck', '--no-nvram'], ['efibootmgr', '-v']]

  2020-10-20T06:57:46+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running
  command ['unshare', '--fork', '--pid', '--', 'chroot',
  '/tmp/tmpxf91lob9/target', 'grub-install', '--target=x86_64-efi',
  '--efi-directory=/boot/efi', '--bootloader-id=ubuntu', '--recheck',
  '--no-nvram'] with allowed return codes [0] (capture=True)

  2) MAAS sets up the boot order to ensure PXE boot:
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Setting currently booted 0016 as the first UEFI loader.
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: New UEFI boot order: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009

  Note that the boot order set is:
  0016 - NIC (PXE IPv4)
  0000 - fail to system utilities

  There device where the OS is installed (Boot000B) is futher down in
  the boot order.

  Consult below:
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpxf91lob9/target', 'efibootmgr', '
  -o', '0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007,0008,0009'] with allowed return codes [0] (capture=False)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootCurrent: 0016
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Timeout: 0 seconds
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: BootOrder: 0016,0000,000B,000C,0018,001A,0010,0012,001C,001E,000A,0014,0001,0002,0003,0004,0005,0006,0007
  ,0008,0009
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0000* System Utilities
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0001  Embedded UEFI Shell
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0002  Diagnose Error
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0003  Intelligent Provisioning
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0004  Boot Menu
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0005  Network Boot
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0006  View Integrated Management Log
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0007  HTTP Boot
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0008  PXE Boot
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0009  Embedded Diagnostics
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000A* Generic USB Boot
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000B* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 447.1 GiB, RAID1 Logical Drive 2(Target:0,
   Lun:1)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot000C* Embedded RAID 1 : HPE Smart Array P816i-a SR Gen10 - 894.2 GiB, RAID1 Logical Drive 1(Target:0,
   Lun:0)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0010* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0012* Slot 1 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0014* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (HTTP(S) IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0016* Embedded FlexibleLOM 1 Port 1 : HPE Ethernet 1Gb 4-port 366FLR Adapter - NIC (PXE IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0018* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001A* Slot 4 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001C* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (HTTP(S) IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot001E* Slot 3 Port 1 : HPE Ethernet 10Gb 2-port 562SFP+ Adapter - NIC (PXE IPv4)
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: Boot0020  Trigger ready-to-boot event

  3) Finalize configuration:
  2020-10-20T06:57:49+00:00 cmp3az2cz20300kv8 cloud-init[2459]: finish: cmd-install/stage-late/maas: SUCCESS: running 'wget --no-proxy http://10-216-240-0--23.maas-internal:5248/MAAS/metadata/latest/by-id/dfkxqh/ --post-data op=netboot_off -O /dev/null'

  4) The server is instructed to reboot. During the reboot is uses MAAS provided grub.cfg:
  2020-10-20 06:59:36 provisioningserver.rackdservices.tftp: [info] /grub/grub.cfg-d4:f5:ef:02:28:94 requested by 10.216.240.106

  MAAS provides grub configuration as follows:

  ubuntu@inf1az1cz202904rz:~$ curl tftp://10.216.240.1/grub/grub.cfg-d4:f5:ef:02:28:94
  set default="0"
  set timeout=0

  menuentry 'Local' {
      echo 'Booting local disk...'
      for bootloader in \
              boot/bootx64.efi \
              ubuntu/shimx64.efi \
              ubuntu/grubx64.efi \
              centos/shimx64.efi \
              centos/grubx64.efi \
              redhat/shimx64.efi \
              redhat/grubx64.efi \
              rhel/shimx64.efi \
              rhel/grubx64.efi \
              red/grubx64.efi \
              Microsoft/Boot/bootmgfw.efi; do
          search --set=root --file /efi/$bootloader
          if [ $? -eq 0 ]; then
              chainloader /efi/$bootloader
              boot
          fi
      done
      # If no bootloader is found exit and allow the next device to boot.
      exit
  }

  Unfortunately this configuration fails to find a bootloader and as
  such it is dropped to next boot entry, that is to Boot0000* System
  Utilities.

  When in grub environment, following variables are set:
  grub> set
  grub_platform=efi
  cmd_path=(tftp,10.216.240.1)
  net_default_interface=efinet3
  net_default_ip=10.216.240.106
  net_default_mac=d4:f5:ef:02:28:94
  net_default_server=10.216.240.1
  net_efinet3_boot_file=bootx64.efi
  net_efinet3_domain=mgt.tlc.cloud
  net_efinet3_ip=10.216.240.106
  net_efinet3_mac=d4:f5:ef:02:28:94
  net_efinet3_next_server=10.216.240.1
  package_version=2.02-2ubuntu8.18
  prefix=(tftp,10.216.240.1)/grub
  pxe_default_server=10.216.240.1
  root=tftp,10.216.240.1

  grub> ls
  (memdisk) (hd0) (hd0,gpt1)
  grub> ls (hd0)
  (hd0): Filesystem is unknown.
  grub> (hd0,gpt1)
  (hd0,gpt1): Filesystem is unknown.
  grub> ls (memdisk)
  (memdisk): Filesystem is fat.
  grub> ls (memdisk)/
  grub.cfg
  grub> cat (memdisk)/grub.cfg
  if [ -e $prefix/x86_64-efi/grub.cfg; ] then
  	source $prefix/x86_64-efi/grub.cfg
  else
  	source $prefix/grub.cfg
  fi

  Trying to run the MAAS provided config fails:
  grub> search --set=root --file /efi/boot/bootx64.efi
  error: no such device: /efi/boot/bootx64.efi 

  Grub does not see the logical volumes (sdb, sdc) hosted on hardware
  raid controller when VID is enabled.

  After disabling the VID (Intelligent Provisioning->BIOS/Platform Configuration(RBSU)->USB options->Virtual Install Disk-Disable), grub enlists all the partitions:
  grub> ls
  (hd0) (hd0,gpt2) (hd0,gpt1) (hd1)
  grub> search --set=root --file /efi/boot/bootx64.efi
   hd0,gpt1

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/1900695/+subscriptions