Go to file
Darrick J. Wong d6f215f359 xfs: split up the xfs_reflink_end_cow work into smaller transactions
In xfs_reflink_end_cow, we allocate a single transaction for the entire
end_cow operation and then loop the CoW fork mappings to move them to
the data fork.  This design fails on a heavily fragmented filesystem
where an inode's data fork has exactly one more extent than would fit in
an extents-format fork, because the unmap can collapse the data fork
into extents format (freeing the bmbt block) but the remap can expand
the data fork back into a (newly allocated) bmbt block.  If the number
of extents we end up remapping is large, we can overflow the block
reservation because we reserved blocks assuming that we were adding
mappings into an already-cleared area of the data fork.

Let's say we have 8 extents in the data fork, 8 extents in the CoW fork,
and the data fork can hold at most 7 extents before needing to convert
to btree format; and that blocks A-P are discontiguous single-block
extents:

   0......7
D: ABCDEFGH
C: IJKLMNOP

When a write to file blocks 0-7 completes, we must remap I-P into the
data fork.  We start by removing H from the btree-format data fork.  Now
we have 7 extents, so we convert the fork to extents format, freeing the
bmbt block.   We then move P into the data fork and it now has 8 extents
again.  We must convert the data fork back to btree format, requiring a
block allocation.  If we repeat this sequence for blocks 6-5-4-3-2-1-0,
we'll need a total of 8 block allocations to remap all 8 blocks.  We
reserved only enough blocks to handle one btree split (5 blocks on a 4k
block filesystem), which means we overflow the block reservation.

To fix this issue, create a separate helper function to remap a single
extent, and change _reflink_end_cow to call it in a tight loop over the
entire range we're completing.  As a side effect this also removes the
size restrictions on how many extents we can end_cow at a time, though
nobody ever hit that.  It is not reasonable to reserve N blocks to remap
N blocks.

Note that this can be reproduced after ~320 million fsx ops while
running generic/938 (long soak directio fsx exerciser):

XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
<machine registers snipped>
Call Trace:
 xfs_trans_dup+0x211/0x250 [xfs]
 xfs_trans_roll+0x6d/0x180 [xfs]
 xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
 xfs_defer_finish_noroll+0xdf/0x740 [xfs]
 xfs_defer_finish+0x13/0x70 [xfs]
 xfs_reflink_end_cow+0x2c6/0x680 [xfs]
 xfs_dio_write_end_io+0x115/0x220 [xfs]
 iomap_dio_complete+0x3f/0x130
 iomap_dio_rw+0x3c3/0x420
 xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
 xfs_file_write_iter+0x8b/0xc0 [xfs]
 __vfs_write+0x193/0x1f0
 vfs_write+0xba/0x1c0
 ksys_write+0x52/0xc0
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
2018-12-12 08:46:19 -08:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
block blk-mq: punt failed direct issue to dispatch list 2018-12-07 08:16:11 -07:00
certs export.h: remove VMLINUX_SYMBOL() and VMLINUX_SYMBOL_STR() 2018-08-22 23:21:44 +09:00
crypto crypto: user - Disable statistics interface 2018-12-07 13:56:08 +08:00
Documentation Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
drivers Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs xfs: split up the xfs_reflink_end_cow work into smaller transactions 2018-12-12 08:46:19 -08:00
include Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
init initramfs: clean old path before creating a hardlink 2018-11-30 14:56:14 -08:00
ipc ipc: IPCMNI limit check for semmni 2018-10-31 08:54:14 -07:00
kernel Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
lib debugobjects: avoid recursive calls with kmemleak 2018-11-30 14:56:14 -08:00
LICENSES This is a fairly typical cycle for documentation. There's some welcome 2018-10-24 18:01:11 +01:00
mm dax fixes 4.20-rc6 2018-12-09 09:54:04 -08:00
net Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
samples VFIO updates for v4.20 2018-10-31 11:01:38 -07:00
scripts Fixes for stackleak 2018-12-07 13:13:07 -08:00
security selinux/stable-4.20 PR 20181129 2018-11-29 10:15:06 -08:00
sound ALSA: hda/realtek: Fix mic issue on Acer AIO Veriton Z4860G/Z6860G 2018-12-05 16:39:59 +01:00
tools Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
usr initramfs: move gen_initramfs_list.sh from scripts/ to usr/ 2018-08-22 23:21:44 +09:00
virt Revert "mm, mmu_notifier: annotate mmu notifiers with blockable invalidate callbacks" 2018-10-26 16:25:19 -07:00
.clang-format page cache: Convert find_get_pages_contig to XArray 2018-10-21 10:46:34 -04:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap mailmap: Update email for Punit Agrawal 2018-11-05 10:02:11 +00:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS MAINTAINERS: change Sparse's maintainer 2018-11-25 09:17:43 -08:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-12-09 15:12:33 -08:00
Makefile Linux 4.20-rc6 2018-12-09 15:31:00 -08:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.