kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #85898
[Bug 1382801] Re: XFS: mount hangs for corrupted filesystem
Regarding the mount hang (and not what caused the corruption):
Im waiting for the following data:
## terminal 1
# apt-get install trace-cmd
# trace-cmd stop && trace-cmd reset
# trace-cmd record -e *xfs*
(it will expect for you to ctrl+c)
## terminal 2
# mount /dev/sdd1 (which i believe is the corrupted one)
## terminal 1 (after 5 seconds)
# trace-cmd report -i ./trace.dat | gzip > trace_xfs_ail.gz
--------------------
Idea here is this:
crash> set 2698
PID: 2698
COMMAND: "mount"
TASK: ffff88084f301800 [THREAD_INFO: ffff88104fe7c000]
CPU: 11
STATE: TASK_UNINTERRUPTIBLE|TASK_TRACED|EXIT_DEAD
crash> bt
PID: 2698 TASK: ffff88084f301800 CPU: 11 COMMAND: "mount"
#0 [ffff88104fe7db38] __schedule at ffffffff8175ded3
#1 [ffff88104fe7dbb0] schedule at ffffffff8175e349
#2 [ffff88104fe7dbc0] xfs_ail_push_all_sync at ffffffffa02f2478 [xfs]
#3 [ffff88104fe7dc30] xfs_log_quiesce at ffffffffa02edb67 [xfs]
#4 [ffff88104fe7dc50] xfs_log_unmount at ffffffffa02edbb6 [xfs]
#5 [ffff88104fe7dc70] xfs_mountfs at ffffffffa02a032a [xfs]
#6 [ffff88104fe7dce0] xfs_fs_fill_super at ffffffffa02a3707 [xfs]
#7 [ffff88104fe7dd20] mount_bdev at ffffffff811cd4a9
#8 [ffff88104fe7ddb0] xfs_fs_mount at ffffffffa02a16f5 [xfs]
#9 [ffff88104fe7ddc0] mount_fs at ffffffff811ce123
#10 [ffff88104fe7de10] vfs_kern_mount at ffffffff811e9bf6
#11 [ffff88104fe7de60] do_new_mount at ffffffff811eb3a4
#12 [ffff88104fe7dec0] do_mount at ffffffff811ec706
#13 [ffff88104fe7df20] sys_mount at ffffffff811ecad0
#14 [ffff88104fe7df80] system_call_fastpath at ffffffff8176ae2d
RIP: 00007fe6f5572c2a RSP: 00007fffcdb172c8 RFLAGS: 00010202
RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
RDX: 0000000000645ae0 RSI: 0000000000645ab0 RDI: 0000000000645a90
RBP: 00000000c0ed0400 R8: 0000000000645b30 R9: 0000000000000001
R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000645ab0
R13: 0000000000645a10 R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
The mount is "hang" because xfs_ail_push_all_sync is not being able to flush
the "AIL" (intent log) from the journal. But xfs_ail_push_all_sync does not
do it by itself. It calls:
wake_up_process(ailp->xa_task);
xa_task here is one kernel thread called "xfsaild/<disk>":
758 int
759 xfs_trans_ail_init(
...
775 ailp->xa_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
776 ailp->xa_mount->m_fsname);
So digging this kernel thread, responsible to commit intent logs:
crash> ps | grep saild
2354 2 3 ffff8808507fb000 IN 0.0 0 0 [xfsaild/sdb1]
2588 2 6 ffff88084717c800 IN 0.0 0 0 [xfsaild/sdh1]
2703 2 9 ffff881042250000 IN 0.0 0 0 [xfsaild/sdd1]
I can see that process 2703 is the one called. Analyzing its stack:
crash> bt
PID: 2703 TASK: ffff881042250000 CPU: 9 COMMAND: "xfsaild/sdd1"
#0 [ffff88103b72fd38] __schedule at ffffffff8175ded3
#1 [ffff88103b72fdb0] schedule at ffffffff8175e349
#2 [ffff88103b72fdc0] schedule_timeout at ffffffff8175d55d
#3 [ffff88103b72fe70] xfsaild at ffffffffa02f2238 [xfs]
#4 [ffff88103b72fec0] kthread at ffffffff8108fb59
#5 [ffff88103b72ff50] ret_from_fork at ffffffff8176ad7c
I can see that schedule is being called by "schedule_timeout"... So xfsaild
is rescheduling itself after "tout" msecs.
530 if (tout)
531 schedule_timeout(msecs_to_jiffies(tout));
tout msecs is the return for this call:
537 tout = xfsaild_push(ailp);
and "xfsaild_push" is the function responsible to flush "ail", the intent
log.
With the trace I asked, I'll be able to see WHY xfsaild_push is not
making the intent log to be empty (like xfs_ail_push_all_sync
needs for it to return).
This will help to clarify 2 things :
1) why xfs cannot flush its intent log
2) type of corruption that makes xfs behave like this
Waiting for more data to continue...
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1382801
Title:
XFS: mount hangs for corrupted filesystem
Status in “linux” package in Ubuntu:
Confirmed
Bug description:
It was brought to my attention this situation:
--------
mount hangs at the following stack:
crash> bt 2882
PID: 2882 TASK: ffff88084e75c800 CPU: 7 COMMAND: "mount"
#0 [ffff880036a73b38] schedule at ffffffff8175e320
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
#4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
#5 [ffff880036a73ce0] xfs_fs_fill_super at ffffffffa0296707 [xfs]
#6 [ffff880036a73d20] mount_bdev at ffffffff811cd4a9
#7 [ffff880036a73db0] xfs_fs_mount at ffffffffa02946f5 [xfs]
#8 [ffff880036a73dc0] mount_fs at ffffffff811ce123
#9 [ffff880036a73e10] vfs_kern_mount at ffffffff811e9bf6
#10 [ffff880036a73e60] do_new_mount at ffffffff811eb3a4
#11 [ffff880036a73ec0] do_mount at ffffffff811ec706
#12 [ffff880036a73f20] sys_mount at ffffffff811ecad0
#13 [ffff880036a73f80] system_call_fastpath at ffffffff8176ae2d
RIP: 00007f2340eb6c2a RSP: 00007fff25675368 RFLAGS: 00010206
RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
RDX: 0000000000b04c20 RSI: 0000000000b04bf0 RDI: 0000000000b04bd0
RBP: 00000000c0ed0400 R8: 0000000000b04c70 R9: 0000000000000001
R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000b04bf0
R13: 0000000000b04b50 R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
The corresponding disk is /dev/sdd1, any IO (xfs_check, etc) also
hangs and had "D" state.
This reproducible with 3.11 and 3.13 kernel both.
The storage node is out of service because of this problem
--------
I'm still asking for more data (sosreport and kernel dump).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions
References