← Back to team overview

kernel-packages team mailing list archive

[Bug 1559609] Re: arcmsr times out with ARC1882 RAID card

 

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Andy Whitcroft (apw)

** Changed in: linux (Ubuntu)
    Milestone: None => ubuntu-16.03

** Description changed:

  tested the latest xenial iso on a file server featuring an ARC-1882ix-24
  RAID controller, and got weird timeout issues, followed by complete loss
  of access to anything connected to the RAID controller. The timeouts
  occur after a random amount of uptime (sometimes minutes, sometimes
  days), for example:
  
  kernel: [1665409.969229] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665411.727535] arcmsr2: scsi id = 0 lun = 1 ccb = '0xffff884fe008e780' poll command abort successfully
  kernel: [1665411.727885] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665411.727898] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665413.138235] arcmsr2: scsi id = 0 lun = 1 ccb = '0xffff884fe0012300' poll command abort successfully
  ...
  kernel: [1665445.804546] arcmsr: executing bus reset eh.....num_resets = 2, num_aborts = 146
  kernel: [1665455.851353] arcmsr2: pCCB ='0xffff884fe002a700' isr got aborted command
  kernel: [1665455.851366] arcmsr2: pCCB ='0xffff884fe01c0a00' isr got aborted command
  kernel: [1665455.851373] arcmsr2: isr get an illegal ccb command #011#011#011#011done acb = '0xffff884fe0b8c798'ccb = '0xffff884fe00e9680' ccbacb = '0xffff884fe0b8c798' startdone = 0x0 ccboutstandingcount = -1
  kernel: [1665455.851378] arcmsr2: isr get an illegal ccb command #011#011#011#011done acb = '0xffff884fe0b8c798'ccb = '0xffff884fe0070280' ccbacb = '0xffff884fe0b8c798' startdone = 0x0 ccboutstandingcount = -1
  ...
  kernel: [1665455.852655] sd 2:0:0:3: [sdd] Medium access timeout failure. Offlining disk!
  kernel: [1665455.890032] sd 2:0:0:4: [sde] Medium access timeout failure. Offlining disk!
  kernel: [1665455.926613] sd 2:0:0:1: [sdb] Medium access timeout failure. Offlining disk!
  kernel: [1665455.963288] sd 2:0:0:2: [sdc] Medium access timeout failure. Offlining disk!
  
- 
- some digging revealed that mainline 4.4 as well as xenial's 4.4.0-14-generic still feature an old, buggy arcmsr driver  v1.30.00.04-20140919, which claims to "supports" the 1882, but does not really...
+ some digging revealed that mainline 4.4 as well as xenial's
+ 4.4.0-14-generic still feature an old, buggy arcmsr driver
+ v1.30.00.04-20140919, which claims to "supports" the 1882, but does not
+ really...
  
  Areca seems to have managed to get a fixed driver into mainline 4.5
  (version v1.30.00.22-20151126), and it seems to be a small patch on
  arcmsr.h and a large one on arcmsr_hba.c, and upon a first glance, I
  didn't see anything 4.5-specific in the code:
  
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/scsi/arcmsr/arcmsr.h?id=v4.5&id2=v4.4
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/scsi/arcmsr/arcmsr_hba.c?id=v4.5&id2=v4.4
  
  Note that we are using v1.30.0X.21-20151016 (as provided by
  Areca.com.tw) on productive 14.04.4 LTS servers featuring ARC1882
  controllers, so chances are good that version 22 (as included in 4.5
  mainline) to work well.
  
  This would not only allow ARC-188x controllers to work properly with
  Xenial out-of-the-box, it should also add support for the (somewhat
  popular?) ARC-1203 series
+ 
+ ===
+ Kernel-Description: update arcmsr to version v1.30.00.22-20151126 to fix card timeouts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1559609

Title:
  arcmsr times out with ARC1882 RAID card

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  tested the latest xenial iso on a file server featuring an ARC-
  1882ix-24 RAID controller, and got weird timeout issues, followed by
  complete loss of access to anything connected to the RAID controller.
  The timeouts occur after a random amount of uptime (sometimes minutes,
  sometimes days), for example:

  kernel: [1665409.969229] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665411.727535] arcmsr2: scsi id = 0 lun = 1 ccb = '0xffff884fe008e780' poll command abort successfully
  kernel: [1665411.727885] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665411.727898] arcmsr2: abort device command of scsi id = 0 lun = 1
  kernel: [1665413.138235] arcmsr2: scsi id = 0 lun = 1 ccb = '0xffff884fe0012300' poll command abort successfully
  ...
  kernel: [1665445.804546] arcmsr: executing bus reset eh.....num_resets = 2, num_aborts = 146
  kernel: [1665455.851353] arcmsr2: pCCB ='0xffff884fe002a700' isr got aborted command
  kernel: [1665455.851366] arcmsr2: pCCB ='0xffff884fe01c0a00' isr got aborted command
  kernel: [1665455.851373] arcmsr2: isr get an illegal ccb command #011#011#011#011done acb = '0xffff884fe0b8c798'ccb = '0xffff884fe00e9680' ccbacb = '0xffff884fe0b8c798' startdone = 0x0 ccboutstandingcount = -1
  kernel: [1665455.851378] arcmsr2: isr get an illegal ccb command #011#011#011#011done acb = '0xffff884fe0b8c798'ccb = '0xffff884fe0070280' ccbacb = '0xffff884fe0b8c798' startdone = 0x0 ccboutstandingcount = -1
  ...
  kernel: [1665455.852655] sd 2:0:0:3: [sdd] Medium access timeout failure. Offlining disk!
  kernel: [1665455.890032] sd 2:0:0:4: [sde] Medium access timeout failure. Offlining disk!
  kernel: [1665455.926613] sd 2:0:0:1: [sdb] Medium access timeout failure. Offlining disk!
  kernel: [1665455.963288] sd 2:0:0:2: [sdc] Medium access timeout failure. Offlining disk!

  some digging revealed that mainline 4.4 as well as xenial's
  4.4.0-14-generic still feature an old, buggy arcmsr driver
  v1.30.00.04-20140919, which claims to "supports" the 1882, but does
  not really...

  Areca seems to have managed to get a fixed driver into mainline 4.5
  (version v1.30.00.22-20151126), and it seems to be a small patch on
  arcmsr.h and a large one on arcmsr_hba.c, and upon a first glance, I
  didn't see anything 4.5-specific in the code:

  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/scsi/arcmsr/arcmsr.h?id=v4.5&id2=v4.4
  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/diff/drivers/scsi/arcmsr/arcmsr_hba.c?id=v4.5&id2=v4.4

  Note that we are using v1.30.0X.21-20151016 (as provided by
  Areca.com.tw) on productive 14.04.4 LTS servers featuring ARC1882
  controllers, so chances are good that version 22 (as included in 4.5
  mainline) to work well.

  This would not only allow ARC-188x controllers to work properly with
  Xenial out-of-the-box, it should also add support for the (somewhat
  popular?) ARC-1203 series

  ===
  Kernel-Description: update arcmsr to version v1.30.00.22-20151126 to fix card timeouts

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1559609/+subscriptions


References