← Back to team overview

kernel-packages team mailing list archive

[Bug 1382801] Re: XFS: mount hangs for corrupted filesystem

 

========
ANALYSIS
========

********
#0 [ffff880036a73b38] schedule at ffffffff8175e320
#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
#4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]

Analyzing the stack trace we can see that, during a xfs_fs_mount, the only
possible way to get into "xfs_log_unmount" is if the XFS filesystem is
CORRUPTED.

636 xfs_mountfs(
637 xfs_mount_t *mp)
638 {
639 xfs_sb_t *sbp = &(mp->m_sb);
640 xfs_inode_t *rip;
...
836 /*
837 * Get and sanity-check the root inode.
838 * Save the pointer to it in the mount structure.
839 */
840 error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL, &rip);
841 if (error) {
842 xfs_warn(mp, "failed to read root inode");
843 goto out_log_dealloc;
844 }
845
...
955 out_log_dealloc:
956 xfs_log_unmount(mp);
...

So we DO KNOW your XFS is considered to be CORRUPTED (by XFS function
xfs_iget(), called for the root inode as a sanity check).
********

Either way, lets continue debugging to make sure we understand why XFS
didn't give us an error about the filesystem being corrupted:

Following the stack:

#2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
#3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]

We can see that xfs_log_quiesce calls

#1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]

The function responsible to push ALL *AIL structure into disk (for unmount
and freeze purposes).

This function has a simple code:

600 struct xfs_log_item *lip;
601 DEFINE_WAIT(wait);
602
603 spin_lock(&ailp->xa_lock);
604 while ((lip = xfs_ail_max(ailp)) != NULL) {
605 prepare_to_wait(&ailp->xa_empty, &wait, TASK_UNINTERRUPTIBLE);
606 ailp->xa_target = lip->li_lsn;
607 wake_up_process(ailp->xa_task);
608 spin_unlock(&ailp->xa_lock);
609 schedule();
610 spin_lock(&ailp->xa_lock);
611 }
612 spin_unlock(&ailp->xa_lock);
613
614 finish_wait(&ailp->xa_empty, &wait);

Where it gets all "xfs_log_items" from the AIL double linked list and
calls the function responsible to commit this "log items" into the disk:

607 wake_up_process(ailp->xa_task);

POSSIBLE XFS BUG:

XFS can be stuck inside this loop because of something happening on the
ail (xa_task) callback function (responsible to commit xfs log items). 
And this, of course, makes the "mount" process to hang in the "UNINTE-
RRUPTIBLE state (since its not safe to let userland kill this process).

OBS: Waiting for the core file so I can stack-trace this XFS worker to check
wether it is hang and what are its causes.

OBS: If XFS is hang on mount, for a corrupted XFS filesystem, this behavior 
is a bug since the "mount" command should have returned saying the 
filesystem didn't pass on the root-inode sanity check. 

Working on this...


** Changed in: linux (Ubuntu)
       Status: Incomplete => Confirmed

** Changed in: linux (Ubuntu)
     Assignee: (unassigned) => Rafael David Tinoco (inaddy)

** Tags added: cts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1382801

Title:
  XFS: mount hangs for corrupted filesystem

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  It was brought to my attention this situation:

  --------
  mount hangs at the following stack:
  crash> bt 2882
  PID: 2882 TASK: ffff88084e75c800 CPU: 7 COMMAND: "mount"
  #0 [ffff880036a73b38] schedule at ffffffff8175e320
  #1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
  #2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
  #3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
  #4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
  #5 [ffff880036a73ce0] xfs_fs_fill_super at ffffffffa0296707 [xfs]
  #6 [ffff880036a73d20] mount_bdev at ffffffff811cd4a9
  #7 [ffff880036a73db0] xfs_fs_mount at ffffffffa02946f5 [xfs]
  #8 [ffff880036a73dc0] mount_fs at ffffffff811ce123
  #9 [ffff880036a73e10] vfs_kern_mount at ffffffff811e9bf6
  #10 [ffff880036a73e60] do_new_mount at ffffffff811eb3a4
  #11 [ffff880036a73ec0] do_mount at ffffffff811ec706
  #12 [ffff880036a73f20] sys_mount at ffffffff811ecad0
  #13 [ffff880036a73f80] system_call_fastpath at ffffffff8176ae2d
  RIP: 00007f2340eb6c2a RSP: 00007fff25675368 RFLAGS: 00010206
  RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
  RDX: 0000000000b04c20 RSI: 0000000000b04bf0 RDI: 0000000000b04bd0
  RBP: 00000000c0ed0400 R8: 0000000000b04c70 R9: 0000000000000001
  R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000b04bf0
  R13: 0000000000b04b50 R14: 0000000000000400 R15: 0000000000000000
  ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b

  The corresponding disk is /dev/sdd1, any IO (xfs_check, etc) also
  hangs and had "D" state.

  This reproducible with 3.11 and 3.13 kernel both.

  The storage node is out of service because of this problem
  --------

  I'm still asking for more data (sosreport and kernel dump).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions


References