kernel-packages team mailing list archive
-
kernel-packages team
-
Mailing list archive
-
Message #85233
[Bug 1382801] Re: XFS: mount hangs for corrupted filesystem
========
ANALYSIS
========
********
#0 [ffff880036a73b38] schedule at ffffffff8175e320
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
#4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
Analyzing the stack trace we can see that, during a xfs_fs_mount, the only
possible way to get into "xfs_log_unmount" is if the XFS filesystem is
CORRUPTED.
636 xfs_mountfs(
637 xfs_mount_t *mp)
638 {
639 xfs_sb_t *sbp = &(mp->m_sb);
640 xfs_inode_t *rip;
...
836 /*
837 * Get and sanity-check the root inode.
838 * Save the pointer to it in the mount structure.
839 */
840 error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
841 if (error) {
842 xfs_warn(mp, "failed to read root inode");
843 goto out_log_dealloc;
844 }
845
...
955 out_log_dealloc:
956 xfs_log_unmount(mp);
...
So we DO KNOW your XFS is considered to be CORRUPTED (by XFS function
xfs_iget(), called for the root inode as a sanity check).
********
Either way, lets continue debugging to make sure we understand why XFS
didn't give us an error about the filesystem being corrupted:
Following the stack:
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
We can see that xfs_log_quiesce calls
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
The function responsible to push ALL *AIL structure into disk (for unmount
and freeze purposes).
This function has a simple code:
600 struct xfs_log_item *lip;
601 DEFINE_WAIT(wait);
602
603 spin_lock(&ailp->xa_lock);
604 while ((lip = xfs_ail_max(ailp)) != NULL) {
605 prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE);
606 ailp->xa_target = lip->li_lsn;
607 wake_up_process(ailp->xa_task);
608 spin_unlock(&ailp->xa_lock);
609 schedule();
610 spin_lock(&ailp->xa_lock);
611 }
612 spin_unlock(&ailp->xa_lock);
613
614 finish_wait(&ailp->xa_empty, &wait);
Where it gets all "xfs_log_items" from the AIL double linked list and
calls the function responsible to commit this "log items" into the disk:
607 wake_up_process(ailp->xa_task);
POSSIBLE XFS BUG:
XFS can be stuck inside this loop because of something happening on the
ail (xa_task) callback function (responsible to commit xfs log items).
And this, of course, makes the "mount" process to hang in the "UNINTE-
RRUPTIBLE state (since its not safe to let userland kill this process).
OBS: Waiting for the core file so I can stack-trace this XFS worker to check
wether it is hang and what are its causes.
OBS: If XFS is hang on mount, for a corrupted XFS filesystem, this behavior
is a bug since the "mount" command should have returned saying the
filesystem didn't pass on the root-inode sanity check.
Working on this...
** Changed in: linux (Ubuntu)
Status: Incomplete => Confirmed
** Changed in: linux (Ubuntu)
Assignee: (unassigned) => Rafael David Tinoco (inaddy)
** Tags added: cts
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1382801
Title:
XFS: mount hangs for corrupted filesystem
Status in “linux” package in Ubuntu:
Confirmed
Bug description:
It was brought to my attention this situation:
--------
mount hangs at the following stack:
crash> bt 2882
PID: 2882 TASK: ffff88084e75c800 CPU: 7 COMMAND: "mount"
#0 [ffff880036a73b38] schedule at ffffffff8175e320
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
#4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
#5 [ffff880036a73ce0] xfs_fs_fill_super at ffffffffa0296707 [xfs]
#6 [ffff880036a73d20] mount_bdev at ffffffff811cd4a9
#7 [ffff880036a73db0] xfs_fs_mount at ffffffffa02946f5 [xfs]
#8 [ffff880036a73dc0] mount_fs at ffffffff811ce123
#9 [ffff880036a73e10] vfs_kern_mount at ffffffff811e9bf6
#10 [ffff880036a73e60] do_new_mount at ffffffff811eb3a4
#11 [ffff880036a73ec0] do_mount at ffffffff811ec706
#12 [ffff880036a73f20] sys_mount at ffffffff811ecad0
#13 [ffff880036a73f80] system_call_fastpath at ffffffff8176ae2d
RIP: 00007f2340eb6c2a RSP: 00007fff25675368 RFLAGS: 00010206
RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
RDX: 0000000000b04c20 RSI: 0000000000b04bf0 RDI: 0000000000b04bd0
RBP: 00000000c0ed0400 R8: 0000000000b04c70 R9: 0000000000000001
R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000b04bf0
R13: 0000000000b04b50 R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b
The corresponding disk is /dev/sdd1, any IO (xfs_check, etc) also
hangs and had "D" state.
This reproducible with 3.11 and 3.13 kernel both.
The storage node is out of service because of this problem
--------
I'm still asking for more data (sosreport and kernel dump).
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions
References