← Back to team overview

kernel-packages team mailing list archive

[Bug 1439872] Re: kernel panic involving drbd

 

We are missing the following fix for 3.16 series:

commit a88215312c5ed74697973f6c9f0fce718bcf18ad
Author: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
Date:   Mon Nov 10 17:21:11 2014 +0100

    drbd: fix race between role change and handshake
    
    Symptoms:
    If DRBD was "cleanly shut down" (all in sync, both Secondary before
    disconnect, identical data generation uuids), and then one side was
    promoted *during* the next connection handshake, the role change
    could confuse the handshake.
    
    The Primary would get stuck in WFBitmapS, the Secondary would log
    unexpected cstate (Connected) in receive_bitmap
    and get stuck in WFBitmapT.
    
    Fix:
    The test in is_valid_soft_transition wrong. It works because
    the not allowed actions (promote/attach) do not touch the
    cstate. The previous condition failed to demand a cstate change
    in one clause.
    
    In order to avoid deadlocks give up the state_mutex while waiting
    for the transient state to go away.
    
    Conflicts:
        drbd/drbd_state.c
        drbd/drbd_state.h
        drbd/drbd_wrappers.h
    
    Signed-off-by: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
    Signed-off-by: Lars Ellenberg <lars.ellenberg@xxxxxxxxxx>
    Signed-off-by: Jens Axboe <axboe@xxxxxx>

That probably fixes this issue. Will provide a PPA with a kernel to be
tested.

** Description changed:

- tinoco@freenode said:
- you are facing a probable race condition for drbd
- checking if the commit (fixing a race condition) is on the 3.16 kernel 
- we are missing this fix:
- commit a88215312c5ed74697973f6c9f0fce718bcf18ad
- Author: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
- Date:   Mon Nov 10 17:21:11 2014 +0100
- drbd: fix race between role change and handshake
- probably
- i need you to open a bug in launchpad for the "linux" package
- and let me know the number
- i'll provide the fix and ask the kernel team to fix this
+ It was brought to my attention the following kernel panic:
  
- Distributor ID: Ubuntu
- Description:    Ubuntu 14.04.2 LTS
+ [1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
+ [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
+ [1191759.892359] drbd vm-database: asender terminated
+ [1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
+ [1191759.892471] drbd vm-database: Connection closed
+ [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone ) 
+ [1191759.892481] drbd vm-database: receiver terminated
+ [1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
+ [1191759.892497] block drbd6: disk( UpToDate -> Failed ) 
+ [1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
+ [1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
+ [1191759.902322] block drbd6: disk( Failed -> Diskless ) 
+ [1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
+ [1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
+ [1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
+ [1191992.834141]       Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
+ [1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
+ [1191992.900367] drbdsetup       D ffff88085fc53440     0 20254      1 0x00000004
+ [1191992.900371]  ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
+ [1191992.900374]  0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
+ [1191992.900378]  0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
+ [1191992.900388] Call Trace:
+ [1191992.900398]  [<ffffffff817675c9>] schedule+0x29/0x70
+ [1191992.900407]  [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
+ [1191992.900413]  [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
+ [1191992.900418]  [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
+ [1191992.900422]  [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
+ [1191992.900426]  [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
+ [1191992.900431]  [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
+ [1191992.900432]  [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
+ [1191992.900434]  [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
+ [1191992.900436]  [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
+ [1191992.900438]  [<ffffffff81691238>] genl_rcv+0x28/0x40
+ [1191992.900439]  [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
+ [1191992.900441]  [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
+ [1191992.900448]  [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
+ [1191992.900452]  [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
+ [1191992.900456]  [<ffffffff811d358a>] do_sync_write+0x5a/0x90
+ [1191992.900458]  [<ffffffff811d4005>] vfs_write+0x195/0x1f0
+ [1191992.900461]  [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
+ [1191992.900464]  [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
+ [1191992.900465] sending NMI to all CPUs:
  
- Linux bluegrass4 3.16.0-31-generic #41~14.04.1-Ubuntu SMP Wed Feb 11
- 19:30:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
- 
- Kernel panic, dump at:
+ Dump at:
  http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2

** Description changed:

  It was brought to my attention the following kernel panic:
  
+   SYSTEM MAP: /boot/System.map-3.16.0-31-generic
+ DEBUG KERNEL: /usr/lib/debug/boot/vmlinux-3.16.0-31-generic  
+     DUMPFILE: ./dump.201503271156  [PARTIAL DUMP]
+         CPUS: 12
+         DATE: Fri Mar 27 12:56:49 2015
+       UPTIME: 13 days, 19:14:00
+ LOAD AVERAGE: 1.72, 0.67, 0.28
+        TASKS: 340
+     NODENAME: bluegrass3
+      RELEASE: 3.16.0-31-generic
+      VERSION: #41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015
+      MACHINE: x86_64  (2397 Mhz)
+       MEMORY: 63.9 GB
+        PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
+ 
  [1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
- [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown ) 
+ [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
  [1191759.892359] drbd vm-database: asender terminated
  [1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
  [1191759.892471] drbd vm-database: Connection closed
- [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone ) 
+ [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
  [1191759.892481] drbd vm-database: receiver terminated
  [1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
- [1191759.892497] block drbd6: disk( UpToDate -> Failed ) 
+ [1191759.892497] block drbd6: disk( UpToDate -> Failed )
  [1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
  [1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- [1191759.902322] block drbd6: disk( Failed -> Diskless ) 
+ [1191759.902322] block drbd6: disk( Failed -> Diskless )
  [1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
  [1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
  [1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
  [1191992.834141]       Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
  [1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [1191992.900367] drbdsetup       D ffff88085fc53440     0 20254      1 0x00000004
  [1191992.900371]  ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
  [1191992.900374]  0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
  [1191992.900378]  0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
  [1191992.900388] Call Trace:
  [1191992.900398]  [<ffffffff817675c9>] schedule+0x29/0x70
  [1191992.900407]  [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
  [1191992.900413]  [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
  [1191992.900418]  [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
  [1191992.900422]  [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
  [1191992.900426]  [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
  [1191992.900431]  [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
  [1191992.900432]  [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
  [1191992.900434]  [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
  [1191992.900436]  [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
  [1191992.900438]  [<ffffffff81691238>] genl_rcv+0x28/0x40
  [1191992.900439]  [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
  [1191992.900441]  [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
  [1191992.900448]  [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
  [1191992.900452]  [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
  [1191992.900456]  [<ffffffff811d358a>] do_sync_write+0x5a/0x90
  [1191992.900458]  [<ffffffff811d4005>] vfs_write+0x195/0x1f0
  [1191992.900461]  [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
  [1191992.900464]  [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
  [1191992.900465] sending NMI to all CPUs:
  
  Dump at:
  http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1439872

Title:
  kernel panic involving drbd

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  It was brought to my attention the following kernel panic:

    SYSTEM MAP: /boot/System.map-3.16.0-31-generic
  DEBUG KERNEL: /usr/lib/debug/boot/vmlinux-3.16.0-31-generic  
      DUMPFILE: ./dump.201503271156  [PARTIAL DUMP]
          CPUS: 12
          DATE: Fri Mar 27 12:56:49 2015
        UPTIME: 13 days, 19:14:00
  LOAD AVERAGE: 1.72, 0.67, 0.28
         TASKS: 340
      NODENAME: bluegrass3
       RELEASE: 3.16.0-31-generic
       VERSION: #41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015
       MACHINE: x86_64  (2397 Mhz)
        MEMORY: 63.9 GB
         PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"

  [1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
  [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
  [1191759.892359] drbd vm-database: asender terminated
  [1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
  [1191759.892471] drbd vm-database: Connection closed
  [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
  [1191759.892481] drbd vm-database: receiver terminated
  [1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
  [1191759.892497] block drbd6: disk( UpToDate -> Failed )
  [1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
  [1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
  [1191759.902322] block drbd6: disk( Failed -> Diskless )
  [1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
  [1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
  [1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
  [1191992.834141]       Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
  [1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [1191992.900367] drbdsetup       D ffff88085fc53440     0 20254      1 0x00000004
  [1191992.900371]  ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
  [1191992.900374]  0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
  [1191992.900378]  0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
  [1191992.900388] Call Trace:
  [1191992.900398]  [<ffffffff817675c9>] schedule+0x29/0x70
  [1191992.900407]  [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
  [1191992.900413]  [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
  [1191992.900418]  [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
  [1191992.900422]  [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
  [1191992.900426]  [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
  [1191992.900431]  [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
  [1191992.900432]  [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
  [1191992.900434]  [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
  [1191992.900436]  [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
  [1191992.900438]  [<ffffffff81691238>] genl_rcv+0x28/0x40
  [1191992.900439]  [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
  [1191992.900441]  [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
  [1191992.900448]  [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
  [1191992.900452]  [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
  [1191992.900456]  [<ffffffff811d358a>] do_sync_write+0x5a/0x90
  [1191992.900458]  [<ffffffff811d4005>] vfs_write+0x195/0x1f0
  [1191992.900461]  [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
  [1191992.900464]  [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
  [1191992.900465] sending NMI to all CPUs:

  Dump at:
  http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439872/+subscriptions


References