← Back to team overview

kernel-packages team mailing list archive

[Bug 1509029] Re: [Hyper-V] Crash in hot-add/remove scsi devices (smp)

 

In progress. Sorry, there were a lot of -proposed requests this week.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1509029

Title:
  [Hyper-V] Crash in hot-add/remove scsi devices (smp)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Trusty:
  Fix Committed
Status in linux source package in Vivid:
  Fix Committed
Status in linux source package in Wily:
  Fix Committed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  On some host errors storvsc module tries to remove sdev by scheduling a job
  which does the following:

     sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
     if (sdev) {
         scsi_remove_device(sdev);
         scsi_device_put(sdev);
     }

  While this code seems correct the following crash is observed:

   general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
   RIP: 0010:[<ffffffff81169979>]  [<ffffffff81169979>] bdi_destroy+0x39/0x220
   ...
   [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
   [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
   [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
   [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
   [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
   [<ffffffff81080791>] process_one_work+0x1b1/0x530
   ...

  The problem comes with the fact that many such jobs (for the same device)
  are being scheduled simultaneously. While scsi_remove_device() uses
  shost->scan_mutex and scsi_device_lookup() will fail for a device in
  SDEV_DEL state there is no protection against someone who did
  scsi_device_lookup() before we actually entered __scsi_remove_device(). So
  the whole scenario looks like that: two callers do simultaneous (or
  preemption happens) calls to scsi_device_lookup() ant these calls succeed
  for all of them, after that both callers try doing scsi_remove_device().
  shost->scan_mutex only serializes their calls to __scsi_remove_device()
  and we end up doing the cleanup path twice.

  Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
  ---
   drivers/scsi/scsi_sysfs.c | 8 ++++++++
   1 file changed, 8 insertions(+)

  diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
  index b333389..e0d2707 100644
  --- a/drivers/scsi/scsi_sysfs.c
  +++ b/drivers/scsi/scsi_sysfs.c
  @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
   {
          struct device *dev = &sdev->sdev_gendev;

  +       /*
  +        * This cleanup path is not reentrant and while it is impossible
  +        * to get a new reference with scsi_device_get() someone can still
  +        * hold a previously acquired one.
  +        */
  +       if (sdev->sdev_state == SDEV_DEL)
  +               return;
  +
          if (sdev->is_visible) {
                  if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
                          return;

  
  --
  2.4.3

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1509029/+subscriptions


References