kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #111340
[Bug 1439872] Re: kernel panic involving drbd
We are missing the following fix for 3.16 series:
commit a88215312c5ed74697973f6c9f0fce718bcf18ad
Author: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
Date: Mon Nov 10 17:21:11 2014 +0100
drbd: fix race between role change and handshake
Symptoms:
If DRBD was "cleanly shut down" (all in sync, both Secondary before
disconnect, identical data generation uuids), and then one side was
promoted *during* the next connection handshake, the role change
could confuse the handshake.
The Primary would get stuck in WFBitmapS, the Secondary would log
unexpected cstate (Connected) in receive_bitmap
and get stuck in WFBitmapT.
Fix:
The test in is_valid_soft_transition wrong. It works because
the not allowed actions (promote/attach) do not touch the
cstate. The previous condition failed to demand a cstate change
in one clause.
In order to avoid deadlocks give up the state_mutex while waiting
for the transient state to go away.
Conflicts:
drbd/drbd_state.c
drbd/drbd_state.h
drbd/drbd_wrappers.h
Signed-off-by: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
Signed-off-by: Lars Ellenberg <lars.ellenberg@xxxxxxxxxx>
Signed-off-by: Jens Axboe <axboe@xxxxxx>
That probably fixes this issue. Will provide a PPA with a kernel to be
tested.
** Description changed:
- tinoco@freenode said:
- you are facing a probable race condition for drbd
- checking if the commit (fixing a race condition) is on the 3.16 kernel
- we are missing this fix:
- commit a88215312c5ed74697973f6c9f0fce718bcf18ad
- Author: Philipp Reisner <philipp.reisner@xxxxxxxxxx>
- Date: Mon Nov 10 17:21:11 2014 +0100
- drbd: fix race between role change and handshake
- probably
- i need you to open a bug in launchpad for the "linux" package
- and let me know the number
- i'll provide the fix and ask the kernel team to fix this
+ It was brought to my attention the following kernel panic:
- Distributor ID: Ubuntu
- Description: Ubuntu 14.04.2 LTS
+ [1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
+ [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
+ [1191759.892359] drbd vm-database: asender terminated
+ [1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
+ [1191759.892471] drbd vm-database: Connection closed
+ [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
+ [1191759.892481] drbd vm-database: receiver terminated
+ [1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
+ [1191759.892497] block drbd6: disk( UpToDate -> Failed )
+ [1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
+ [1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
+ [1191759.902322] block drbd6: disk( Failed -> Diskless )
+ [1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
+ [1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
+ [1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
+ [1191992.834141] Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
+ [1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
+ [1191992.900367] drbdsetup D ffff88085fc53440 0 20254 1 0x00000004
+ [1191992.900371] ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
+ [1191992.900374] 0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
+ [1191992.900378] 0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
+ [1191992.900388] Call Trace:
+ [1191992.900398] [<ffffffff817675c9>] schedule+0x29/0x70
+ [1191992.900407] [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
+ [1191992.900413] [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
+ [1191992.900418] [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
+ [1191992.900422] [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
+ [1191992.900426] [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
+ [1191992.900431] [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
+ [1191992.900432] [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
+ [1191992.900434] [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
+ [1191992.900436] [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
+ [1191992.900438] [<ffffffff81691238>] genl_rcv+0x28/0x40
+ [1191992.900439] [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
+ [1191992.900441] [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
+ [1191992.900448] [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
+ [1191992.900452] [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
+ [1191992.900456] [<ffffffff811d358a>] do_sync_write+0x5a/0x90
+ [1191992.900458] [<ffffffff811d4005>] vfs_write+0x195/0x1f0
+ [1191992.900461] [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
+ [1191992.900464] [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
+ [1191992.900465] sending NMI to all CPUs:
- Linux bluegrass4 3.16.0-31-generic #41~14.04.1-Ubuntu SMP Wed Feb 11
- 19:30:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
-
- Kernel panic, dump at:
+ Dump at:
http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2
** Description changed:
It was brought to my attention the following kernel panic:
+ SYSTEM MAP: /boot/System.map-3.16.0-31-generic
+ DEBUG KERNEL: /usr/lib/debug/boot/vmlinux-3.16.0-31-generic
+ DUMPFILE: ./dump.201503271156 [PARTIAL DUMP]
+ CPUS: 12
+ DATE: Fri Mar 27 12:56:49 2015
+ UPTIME: 13 days, 19:14:00
+ LOAD AVERAGE: 1.72, 0.67, 0.28
+ TASKS: 340
+ NODENAME: bluegrass3
+ RELEASE: 3.16.0-31-generic
+ VERSION: #41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015
+ MACHINE: x86_64 (2397 Mhz)
+ MEMORY: 63.9 GB
+ PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
+
[1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
- [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
+ [1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[1191759.892359] drbd vm-database: asender terminated
[1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
[1191759.892471] drbd vm-database: Connection closed
- [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
+ [1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
[1191759.892481] drbd vm-database: receiver terminated
[1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
- [1191759.892497] block drbd6: disk( UpToDate -> Failed )
+ [1191759.892497] block drbd6: disk( UpToDate -> Failed )
[1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
[1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
- [1191759.902322] block drbd6: disk( Failed -> Diskless )
+ [1191759.902322] block drbd6: disk( Failed -> Diskless )
[1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
[1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
[1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
[1191992.834141] Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
[1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1191992.900367] drbdsetup D ffff88085fc53440 0 20254 1 0x00000004
[1191992.900371] ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
[1191992.900374] 0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
[1191992.900378] 0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
[1191992.900388] Call Trace:
[1191992.900398] [<ffffffff817675c9>] schedule+0x29/0x70
[1191992.900407] [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
[1191992.900413] [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
[1191992.900418] [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
[1191992.900422] [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
[1191992.900426] [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
[1191992.900431] [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
[1191992.900432] [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
[1191992.900434] [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
[1191992.900436] [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
[1191992.900438] [<ffffffff81691238>] genl_rcv+0x28/0x40
[1191992.900439] [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
[1191992.900441] [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
[1191992.900448] [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
[1191992.900452] [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
[1191992.900456] [<ffffffff811d358a>] do_sync_write+0x5a/0x90
[1191992.900458] [<ffffffff811d4005>] vfs_write+0x195/0x1f0
[1191992.900461] [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
[1191992.900464] [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
[1191992.900465] sending NMI to all CPUs:
Dump at:
http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1439872
Title:
kernel panic involving drbd
Status in linux package in Ubuntu:
Incomplete
Bug description:
It was brought to my attention the following kernel panic:
SYSTEM MAP: /boot/System.map-3.16.0-31-generic
DEBUG KERNEL: /usr/lib/debug/boot/vmlinux-3.16.0-31-generic
DUMPFILE: ./dump.201503271156 [PARTIAL DUMP]
CPUS: 12
DATE: Fri Mar 27 12:56:49 2015
UPTIME: 13 days, 19:14:00
LOAD AVERAGE: 1.72, 0.67, 0.28
TASKS: 340
NODENAME: bluegrass3
RELEASE: 3.16.0-31-generic
VERSION: #41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015
MACHINE: x86_64 (2397 Mhz)
MEMORY: 63.9 GB
PANIC: "Kernel panic - not syncing: hung_task: blocked tasks"
[1191751.738854] request: minor=1, resource=vm-appserver; but that minor belongs to resource libvirt
[1191759.892350] drbd vm-database: peer( Secondary -> Unknown ) conn( Connected -> Disconnecting ) pdsk( UpToDate -> DUnknown )
[1191759.892359] drbd vm-database: asender terminated
[1191759.892362] drbd vm-database: Terminating drbd_a_vm-datab
[1191759.892471] drbd vm-database: Connection closed
[1191759.892480] drbd vm-database: conn( Disconnecting -> StandAlone )
[1191759.892481] drbd vm-database: receiver terminated
[1191759.892485] drbd vm-database: Terminating drbd_r_vm-datab
[1191759.892497] block drbd6: disk( UpToDate -> Failed )
[1191759.902311] block drbd6: bitmap WRITE of 0 pages took 0 jiffies
[1191759.902315] block drbd6: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[1191759.902322] block drbd6: disk( Failed -> Diskless )
[1191759.902565] block drbd6: drbd_bm_resize called with capacity == 0
[1191759.902585] drbd vm-database: Terminating drbd_w_vm-datab
[1191992.802513] INFO: task drbdsetup:20254 blocked for more than 120 seconds.
[1191992.834141] Not tainted 3.16.0-31-generic #41~14.04.1-Ubuntu
[1191992.862889] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1191992.900367] drbdsetup D ffff88085fc53440 0 20254 1 0x00000004
[1191992.900371] ffff880845967ab8 0000000000000086 ffff880abe65c750 ffff880845967fd8
[1191992.900374] 0000000000013440 0000000000013440 ffff8808546d65e0 ffff8808b5365800
[1191992.900378] 0000000000001e00 0000000000000400 000000000000000e ffff8808b5365a30
[1191992.900388] Call Trace:
[1191992.900398] [<ffffffff817675c9>] schedule+0x29/0x70
[1191992.900407] [<ffffffffc078fba5>] _drbd_request_state+0x65/0xb0 [drbd]
[1191992.900413] [<ffffffff810b4d10>] ? prepare_to_wait_event+0x100/0x100
[1191992.900418] [<ffffffffc0787a3e>] adm_detach.part.52+0x3e/0x100 [drbd]
[1191992.900422] [<ffffffffc07842f1>] ? drbd_adm_prepare.isra.48+0xd1/0x4e0 [drbd]
[1191992.900426] [<ffffffffc0787c49>] drbd_adm_detach+0x149/0x150 [drbd]
[1191992.900431] [<ffffffff81692a39>] genl_family_rcv_msg+0x199/0x380
[1191992.900432] [<ffffffff81692c20>] ? genl_family_rcv_msg+0x380/0x380
[1191992.900434] [<ffffffff81692cb1>] genl_rcv_msg+0x91/0xd0
[1191992.900436] [<ffffffff81690d39>] netlink_rcv_skb+0xa9/0xc0
[1191992.900438] [<ffffffff81691238>] genl_rcv+0x28/0x40
[1191992.900439] [<ffffffff816903f3>] netlink_unicast+0xf3/0x200
[1191992.900441] [<ffffffff81690815>] netlink_sendmsg+0x315/0x680
[1191992.900448] [<ffffffff81333d7d>] ? aa_sk_perm.isra.4+0x6d/0x150
[1191992.900452] [<ffffffff8164625e>] sock_aio_write+0xfe/0x130
[1191992.900456] [<ffffffff811d358a>] do_sync_write+0x5a/0x90
[1191992.900458] [<ffffffff811d4005>] vfs_write+0x195/0x1f0
[1191992.900461] [<ffffffff811d4ac6>] SyS_write+0x46/0xb0
[1191992.900464] [<ffffffff8176b66d>] system_call_fastpath+0x1a/0x1f
[1191992.900465] sending NMI to all CPUs:
Dump at:
http://telsasoft.com/tmp/varcrash-201503271156.tar.bz2
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1439872/+subscriptions
References