← Back to team overview

kernel-packages team mailing list archive

[Bug 1382801] Re: XFS: mount hangs for corrupted filesystem

 

Regarding the mount hang (and not what caused the corruption):

Im waiting for the following data:

## terminal 1

# apt-get install trace-cmd
# trace-cmd stop && trace-cmd reset
# trace-cmd record -e *xfs*

(it will expect for you to ctrl+c)

## terminal 2

# mount /dev/sdd1 (which i believe is the corrupted one)

## terminal 1 (after 5 seconds)

# trace-cmd report -i ./trace.dat | gzip > trace_xfs_ail.gz

--------------------

Idea here is this:

crash> set 2698
PID: 2698
COMMAND: "mount"
TASK: ffff88084f301800 [THREAD_INFO: ffff88104fe7c000]
CPU: 11
STATE: TASK_UNINTERRUPTIBLE|TASK_TRACED|EXIT_DEAD
crash> bt
PID: 2698 TASK: ffff88084f301800 CPU: 11 COMMAND: "mount"
#0 [ffff88104fe7db38] __schedule at ffffffff8175ded3
#1 [ffff88104fe7dbb0] schedule at ffffffff8175e349
#2 [ffff88104fe7dbc0] xfs_ail_push_all_sync at ffffffffa02f2478 [xfs]
#3 [ffff88104fe7dc30] xfs_log_quiesce at ffffffffa02edb67 [xfs]
#4 [ffff88104fe7dc50] xfs_log_unmount at ffffffffa02edbb6 [xfs]
#5 [ffff88104fe7dc70] xfs_mountfs at ffffffffa02a032a [xfs]
#6 [ffff88104fe7dce0] xfs_fs_fill_super at ffffffffa02a3707 [xfs]
#7 [ffff88104fe7dd20] mount_bdev at ffffffff811cd4a9
#8 [ffff88104fe7ddb0] xfs_fs_mount at ffffffffa02a16f5 [xfs]
#9 [ffff88104fe7ddc0] mount_fs at ffffffff811ce123
#10 [ffff88104fe7de10] vfs_kern_mount at ffffffff811e9bf6
#11 [ffff88104fe7de60] do_new_mount at ffffffff811eb3a4
#12 [ffff88104fe7dec0] do_mount at ffffffff811ec706
#13 [ffff88104fe7df20] sys_mount at ffffffff811ecad0
#14 [ffff88104fe7df80] system_call_fastpath at ffffffff8176ae2d
RIP: 00007fe6f5572c2a RSP: 00007fffcdb172c8 RFLAGS: 00010202
RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
RDX: 0000000000645ae0 RSI: 0000000000645ab0 RDI: 0000000000645a90
RBP: 00000000c0ed0400 R8: 0000000000645b30 R9: 0000000000000001
R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000645ab0
R13: 0000000000645a10 R14: 0000000000000400 R15: 0000000000000000
ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b

The mount is "hang" because xfs_ail_push_all_sync is not being able to flush
the "AIL" (intent log) from the journal. But xfs_ail_push_all_sync does not
do it by itself. It calls:

wake_up_process(ailp->xa_task);

xa_task here is one kernel thread called "xfsaild/<disk>":

758 int
759 xfs_trans_ail_init(
...
775 ailp->xa_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
776 ailp->xa_mount->m_fsname);

So digging this kernel thread, responsible to commit intent logs:

crash> ps | grep saild
2354 2 3 ffff8808507fb000 IN 0.0 0 0 [xfsaild/sdb1]
2588 2 6 ffff88084717c800 IN 0.0 0 0 [xfsaild/sdh1]
2703 2 9 ffff881042250000 IN 0.0 0 0 [xfsaild/sdd1]

I can see that process 2703 is the one called. Analyzing its stack:

crash> bt
PID: 2703 TASK: ffff881042250000 CPU: 9 COMMAND: "xfsaild/sdd1"
#0 [ffff88103b72fd38] __schedule at ffffffff8175ded3
#1 [ffff88103b72fdb0] schedule at ffffffff8175e349
#2 [ffff88103b72fdc0] schedule_timeout at ffffffff8175d55d
#3 [ffff88103b72fe70] xfsaild at ffffffffa02f2238 [xfs]
#4 [ffff88103b72fec0] kthread at ffffffff8108fb59
#5 [ffff88103b72ff50] ret_from_fork at ffffffff8176ad7c

I can see that schedule is being called by "schedule_timeout"... So xfsaild
is rescheduling itself after "tout" msecs.

530 if (tout)
531 schedule_timeout(msecs_to_jiffies(tout));

tout msecs is the return for this call:

537 tout = xfsaild_push(ailp);

and "xfsaild_push" is the function responsible to flush "ail", the intent
log.

With the trace I asked, I'll be able to see WHY xfsaild_push is not
making the intent log to be empty (like xfs_ail_push_all_sync
needs for it to return).

This will help to clarify 2 things :

1) why xfs cannot flush its intent log
2) type of corruption that makes xfs behave like this

Waiting for more data to continue...

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1382801

Title:
  XFS: mount hangs for corrupted filesystem

Status in “linux” package in Ubuntu:
  Confirmed

Bug description:
  It was brought to my attention this situation:

  --------
  mount hangs at the following stack:
  crash> bt 2882
  PID: 2882 TASK: ffff88084e75c800 CPU: 7 COMMAND: "mount"
  #0 [ffff880036a73b38] schedule at ffffffff8175e320
  #1 [ffff880036a73bc0] xfs_ail_push_all_sync at ffffffffa02e5478 [xfs]
  #2 [ffff880036a73c30] xfs_log_quiesce at ffffffffa02e0b67 [xfs]
  #3 [ffff880036a73c50] xfs_log_unmount at ffffffffa02e0bb6 [xfs]
  #4 [ffff880036a73c70] xfs_mountfs at ffffffffa029332a [xfs]
  #5 [ffff880036a73ce0] xfs_fs_fill_super at ffffffffa0296707 [xfs]
  #6 [ffff880036a73d20] mount_bdev at ffffffff811cd4a9
  #7 [ffff880036a73db0] xfs_fs_mount at ffffffffa02946f5 [xfs]
  #8 [ffff880036a73dc0] mount_fs at ffffffff811ce123
  #9 [ffff880036a73e10] vfs_kern_mount at ffffffff811e9bf6
  #10 [ffff880036a73e60] do_new_mount at ffffffff811eb3a4
  #11 [ffff880036a73ec0] do_mount at ffffffff811ec706
  #12 [ffff880036a73f20] sys_mount at ffffffff811ecad0
  #13 [ffff880036a73f80] system_call_fastpath at ffffffff8176ae2d
  RIP: 00007f2340eb6c2a RSP: 00007fff25675368 RFLAGS: 00010206
  RAX: 00000000000000a5 RBX: ffffffff8176ae2d RCX: 0000000000000026
  RDX: 0000000000b04c20 RSI: 0000000000b04bf0 RDI: 0000000000b04bd0
  RBP: 00000000c0ed0400 R8: 0000000000b04c70 R9: 0000000000000001
  R10: ffffffffc0ed0400 R11: 0000000000000202 R12: 0000000000b04bf0
  R13: 0000000000b04b50 R14: 0000000000000400 R15: 0000000000000000
  ORIG_RAX: 00000000000000a5 CS: 0033 SS: 002b

  The corresponding disk is /dev/sdd1, any IO (xfs_check, etc) also
  hangs and had "D" state.

  This reproducible with 3.11 and 3.13 kernel both.

  The storage node is out of service because of this problem
  --------

  I'm still asking for more data (sosreport and kernel dump).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1382801/+subscriptions


References