Go to file
Linus Torvalds 876195e1c8 mm: make wait_on_page_writeback() wait for multiple pending writebacks
commit c2407cf7d22d0c0d94cf20342b3b8f06f1d904e7 upstream.

Ever since commit 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common()
logic") we've had some very occasional reports of BUG_ON(PageWriteback)
in write_cache_pages(), which we thought we already fixed in commit
073861ed77 ("mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)").

But syzbot just reported another one, even with that commit in place.

And it turns out that there's a simpler way to trigger the BUG_ON() than
the one Hugh found with page re-use.  It all boils down to the fact that
the page writeback is ostensibly serialized by the page lock, but that
isn't actually really true.

Yes, the people _setting_ writeback all do so under the page lock, but
the actual clearing of the bit - and waking up any waiters - happens
without any page lock.

This gives us this fairly simple race condition:

  CPU1 = end previous writeback
  CPU2 = start new writeback under page lock
  CPU3 = write_cache_pages()

  CPU1          CPU2            CPU3
  ----          ----            ----

  end_page_writeback()
    test_clear_page_writeback(page)
    ... delayed...

                lock_page();
                set_page_writeback()
                unlock_page()

                                lock_page()
                                wait_on_page_writeback();

    wake_up_page(page, PG_writeback);
    .. wakes up CPU3 ..

                                BUG_ON(PageWriteback(page));

where the BUG_ON() happens because we woke up the PG_writeback bit
becasue of the _previous_ writeback, but a new one had already been
started because the clearing of the bit wasn't actually atomic wrt the
actual wakeup or serialized by the page lock.

The reason this didn't use to happen was that the old logic in waiting
on a page bit would just loop if it ever saw the bit set again.

The nice proper fix would probably be to get rid of the whole "wait for
writeback to clear, and then set it" logic in the writeback path, and
replace it with an atomic "wait-to-set" (ie the same as we have for page
locking: we set the page lock bit with a single "lock_page()", not with
"wait for lock bit to clear and then set it").

However, out current model for writeback is that the waiting for the
writeback bit is done by the generic VFS code (ie write_cache_pages()),
but the actual setting of the writeback bit is done much later by the
filesystem ".writepages()" function.

IOW, to make the writeback bit have that same kind of "wait-to-set"
behavior as we have for page locking, we'd have to change our roughly
~50 different writeback functions.  Painful.

Instead, just make "wait_on_page_writeback()" loop on the very unlikely
situation that the PG_writeback bit is still set, basically re-instating
the old behavior.  This is very non-optimal in case of contention, but
since we only ever set the bit under the page lock, that situation is
controlled.

Reported-by: syzbot+2fc0712f8f8b8b8fa0ef@syzkaller.appspotmail.com
Fixes: 2a9127fcf2 ("mm: rewrite wait_on_page_bit_common() logic")
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-12 20:18:22 +01:00
arch powerpc: Handle .text.{hot,unlikely}.* in linker script 2021-01-12 20:18:17 +01:00
block scsi: block: Do not accept any requests while suspended 2021-01-12 20:18:17 +01:00
certs
crypto crypto: asym_tpm: correct zero out potential secrets 2021-01-12 20:18:17 +01:00
Documentation dt-bindings: rtc: add reset-source property 2021-01-09 13:46:22 +01:00
drivers hwmon: (amd_energy) fix allocation of hwmon_channel_info config 2021-01-12 20:18:22 +01:00
fs exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
include scsi: block: Do not accept any requests while suspended 2021-01-12 20:18:17 +01:00
init exec: Transform exec_update_mutex into a rw_semaphore 2021-01-09 13:46:24 +01:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel workqueue: Kick a worker based on the actual activation of delayed works 2021-01-12 20:18:14 +01:00
lib lib/genalloc: fix the overflow when size is too big 2021-01-12 20:18:16 +01:00
LICENSES LICENSES/deprecated: add Zlib license text 2020-09-16 14:33:49 +02:00
mm mm: make wait_on_page_writeback() wait for multiple pending writebacks 2021-01-12 20:18:22 +01:00
net erspan: fix version 1 check in gre_parse_header() 2021-01-12 20:18:12 +01:00
samples samples/bpf: Fix possible hang in xdpsock with multiple threads 2020-12-30 11:53:49 +01:00
scripts depmod: handle the case of /sbin/depmod without /sbin in PATH 2021-01-12 20:18:16 +01:00
security ima: Don't modify file descriptor mode on the fly 2020-12-30 11:54:17 +01:00
sound ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks 2021-01-12 20:18:20 +01:00
tools selftests/vm: fix building protection keys test 2021-01-12 20:18:14 +01:00
usr
virt kvm: x86/mmu: Support dirty logging for the TDP MMU 2020-10-23 03:42:13 -04:00
.clang-format RDMA 5.10 pull request 2020-10-17 11:18:18 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore .gitignore: docs: ignore sphinx_*/ directories 2020-09-10 10:44:31 -06:00
.mailmap mailmap: add two more addresses of Uwe Kleine-König 2020-12-06 10:19:07 -08:00
COPYING
CREDITS MAINTAINERS: Move Jason Cooper to CREDITS 2020-11-30 10:20:34 +01:00
Kbuild
Kconfig
MAINTAINERS Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-12-10 15:30:13 -08:00
Makefile kbuild: don't hardcode depmod path 2021-01-12 20:18:16 +01:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.