kernel_optimize_test/fs
Linus Torvalds 3b2018f9c9 pipe: do FASYNC notifications for every pipe IO, not just state changes
commit fe67f4dd8daa252eb9aa7acb61555f3cc3c1ce4c upstream.

It turns out that the SIGIO/FASYNC situation is almost exactly the same
as the EPOLLET case was: user space really wants to be notified after
every operation.

Now, in a perfect world it should be sufficient to only notify user
space on "state transitions" when the IO state changes (ie when a pipe
goes from unreadable to readable, or from unwritable to writable).  User
space should then do as much as possible - fully emptying the buffer or
what not - and we'll notify it again the next time the state changes.

But as with EPOLLET, we have at least one case (stress-ng) where the
kernel sent SIGIO due to the pipe being marked for asynchronous
notification, but the user space signal handler then didn't actually
necessarily read it all before returning (it read more than what was
written, but since there could be multiple writes, it could leave data
pending).

The user space code then expected to get another SIGIO for subsequent
writes - even though the pipe had been readable the whole time - and
would only then read more.

This is arguably a user space bug - and Colin King already fixed the
stress-ng code in question - but the kernel regression rules are clear:
it doesn't matter if kernel people think that user space did something
silly and wrong.  What matters is that it used to work.

So if user space depends on specific historical kernel behavior, it's a
regression when that behavior changes.  It's on us: we were silly to
have that non-optimal historical behavior, and our old kernel behavior
was what user space was tested against.

Because of how the FASYNC notification was tied to wakeup behavior, this
was first broken by commits f467a6a664 and 1b6b26ae70 ("pipe: fix
and clarify pipe read/write wakeup logic"), but at the time it seems
nobody noticed.  Probably because the stress-ng problem case ends up
being timing-dependent too.

It was then unwittingly fixed by commit 3a34b13a88ca ("pipe: make pipe
writes always wake up readers") only to be broken again when by commit
3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal
loads").

And at that point the kernel test robot noticed the performance
refression in the stress-ng.sigio.ops_per_sec case.  So the "Fixes" tag
below is somewhat ad hoc, but it matches when the issue was noticed.

Fix it for good (knock wood) by simply making the kill_fasync() case
separate from the wakeup case.  FASYNC is quite rare, and we clearly
shouldn't even try to use the "avoid unnecessary wakeups" logic for it.

Link: https://lore.kernel.org/lkml/20210824151337.GC27667@xsang-OptiPlex-9020/
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-09-03 10:09:28 +02:00
..
9p
adfs
affs fs/affs: release old buffer head on error path 2021-03-04 11:38:37 +01:00
afs afs: Fix tracepoint string placement with built-in AFS 2021-07-28 14:35:41 +02:00
autofs
befs
bfs bfs: don't use WARNING: string when it's just info. 2021-01-06 14:56:52 +01:00
btrfs btrfs: fix race between marking inode needs to be logged and log syncing 2021-09-03 10:09:28 +02:00
cachefiles fs/cachefiles: Remove wait_bit_key layout dependency 2021-03-30 14:32:07 +02:00
ceph ceph: correctly handle releasing an embedded cap flush 2021-09-03 10:09:22 +02:00
cifs cifs: create sd context must be a multiple of 8 2021-08-18 08:59:06 +02:00
coda
configfs configfs: fix memleak in configfs_release_bin_file 2021-07-14 16:56:48 +02:00
cramfs
crypto fscrypt: fix derivation of SipHash keys on big endian CPUs 2021-07-14 16:56:53 +02:00
debugfs debugfs: Make debugfs_allow RO after init 2021-05-19 10:13:19 +02:00
devpts
dlm fs: dlm: fix memory leak when fenced 2021-07-14 16:55:59 +02:00
ecryptfs Revert "ecryptfs: replace BUG_ON with error handling code" 2021-05-26 12:06:55 +02:00
efivarfs
efs
erofs erofs: fix error return code in erofs_read_superblock() 2021-07-14 16:56:53 +02:00
exfat exfat: handle wrong stream entry size in exfat_readdir() 2021-07-14 16:56:52 +02:00
exportfs
ext2
ext4 ext4: fix potential htree corruption when growing large_dir directories 2021-08-12 13:22:14 +02:00
f2fs f2fs: Show casefolding support only when supported 2021-07-25 14:36:17 +02:00
fat
freevxfs
fscache
fuse fuse: reject internal errno 2021-07-14 16:55:47 +02:00
gfs2 gfs2: Fix error handling in init_statfs 2021-07-14 16:55:38 +02:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-31 08:16:12 +02:00
hfsplus hfsplus: prevent corruption in shrinking truncate 2021-05-19 10:13:10 +02:00
hostfs hostfs: fix memory handling in follow_link() 2021-04-14 08:42:06 +02:00
hpfs
hugetlbfs hugetlbfs: fix mount mode command line processing 2021-07-28 14:35:46 +02:00
iomap iomap: remove the length variable in iomap_seek_hole 2021-07-31 08:16:12 +02:00
isofs isofs: release buffer head before return 2021-03-04 11:38:00 +01:00
jbd2 ext4: fix debug format string warning 2021-05-19 10:13:19 +02:00
jffs2 jffs2: check the validity of dstlen in jffs2_zlib_compress() 2021-05-11 14:47:36 +02:00
jfs fs/jfs: Fix missing error code in lmLogInit() 2021-07-20 16:05:40 +02:00
kernfs kernfs: wire up ->splice_read and ->splice_write 2021-01-27 11:55:29 +01:00
lockd lockd: don't use interval-based rebinding over TCP 2020-12-30 11:53:30 +01:00
minix
nfs NFSv4/pNFS: Don't call _nfs4_pnfs_v3_ds_connect multiple times 2021-07-20 16:05:53 +02:00
nfs_common nfs_common: need lock during iterate through the list 2020-12-30 11:53:45 +01:00
nfsd nfsd: Reduce contention for the nfsd_file nf_rwsem 2021-07-20 16:05:53 +02:00
nilfs2 nilfs2: fix memory leak in nilfs_sysfs_delete_device_group 2021-06-30 08:47:24 -04:00
nls
notify fanotify: fix copy_event_to_user() fid error clean up 2021-06-23 14:42:41 +02:00
ntfs ntfs: fix validity check for file name attribute 2021-07-14 16:55:38 +02:00
ocfs2 ocfs2: issue zeroout to EOF blocks 2021-08-04 12:46:40 +02:00
omfs
openpromfs
orangefs orangefs: fix orangefs df output. 2021-07-20 16:05:48 +02:00
overlayfs ovl: fix uninitialized pointer read in ovl_lookup_real_one() 2021-09-03 10:09:22 +02:00
proc proc: Avoid mixing integer types in mem_rw() 2021-07-28 14:35:42 +02:00
pstore mark pstore-blk as broken 2021-07-14 16:56:12 +02:00
qnx4
qnx6
quota quota: Fix memory leak when handling corrupted quota file 2021-03-04 11:37:53 +01:00
ramfs
reiserfs reiserfs: check directory items on read from disk 2021-08-12 13:22:19 +02:00
romfs
squashfs squashfs: fix divide error in calculate_skip() 2021-05-19 10:13:10 +02:00
sysfs
sysv
tracefs
ubifs ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode 2021-07-20 16:05:51 +02:00
udf udf: Fix NULL pointer dereference in udf_symlink function 2021-07-19 09:44:40 +02:00
ufs
unicode
vboxsf vboxsf: Add support for the atomic_open directory-inode op 2021-08-18 08:59:18 +02:00
verity
xfs xfs: fix return of uninitialized value in variable error 2021-05-14 09:50:34 +02:00
zonefs zonefs: fix to update .i_wr_refcnt correctly in zonefs_open_zone() 2021-03-25 09:04:05 +01:00
aio.c
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c binfmt_misc: fix possible deadlock in bm_register_write 2021-03-17 17:06:35 +01:00
binfmt_script.c
block_dev.c block: fix a race between del_gendisk and BLKRRPART 2021-06-03 09:00:45 +02:00
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c dax: fix ENOMEM handling in grab_mapping_entry() 2021-07-14 16:56:13 +02:00
dcache.c
dcookies.c
direct-io.c fs: direct-io: fix missing sdio->boundary 2021-04-14 08:41:58 +02:00
drop_caches.c
eventfd.c
eventpoll.c fs/epoll: restore waking from ep_done_scan() 2021-05-11 14:47:12 +02:00
exec.c Add a reference to ucounts for each cred 2021-07-14 16:55:48 +02:00
fcntl.c fcntl: Fix potential deadlock in send_sig{io, urg}() 2021-01-06 14:56:53 +01:00
fhandle.c
file_table.c
file.c kernel/io_uring: cancel io_uring before task works 2021-01-30 13:55:18 +01:00
filesystems.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fs-writeback.c writeback: fix obtain a reference to a freeing memcg css 2021-07-14 16:56:31 +02:00
fsopen.c
init.c
inode.c fs: Handle I_DONTCACHE in iput_final() instead of generic_drop_inode() 2020-12-30 11:53:49 +01:00
internal.h cgroup1: fix leaked context root causing sporadic NULL deref in LTP 2021-07-31 08:16:11 +02:00
io_uring.c io_uring: only assign io_uring_enter() SQPOLL error in actual error case 2021-08-26 08:35:57 -04:00
io-wq.c io_uring: fix false WARN_ONCE 2021-07-19 09:44:51 +02:00
io-wq.h io_uring: always batch cancel in *cancel_files() 2021-02-13 13:54:56 +01:00
ioctl.c
Kconfig tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha 2021-02-17 11:02:21 +01:00
Kconfig.binfmt
kernel_read_file.c
libfs.c
locks.c Revert "nfsd4: a client's own opens needn't prevent delegations" 2021-03-20 10:43:44 +01:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late 2021-04-14 08:41:58 +02:00
namespace.c fs: warn about impending deprecation of mandatory locks 2021-08-26 08:35:57 -04:00
no-block.c
nsfs.c
open.c open: don't silently ignore unknown O-flags in openat2() 2021-07-14 16:55:59 +02:00
pipe.c pipe: do FASYNC notifications for every pipe IO, not just state changes 2021-09-03 10:09:28 +02:00
pnode.c
pnode.h mount: fix mounting of detached mounts onto targets that reside on shared mounts 2021-03-17 17:06:13 +01:00
posix_acl.c
proc_namespace.c proc mountinfo: make splice available again 2020-12-30 11:54:02 +01:00
read_write.c
readdir.c readdir: make sure to verify directory entry for legacy interfaces too 2021-04-21 13:00:54 +02:00
remap_range.c
select.c kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() 2021-03-25 09:04:16 +01:00
seq_file.c seq_file: disallow extremely large seq buffer allocations 2021-07-20 16:05:59 +02:00
signalfd.c
splice.c
stack.c
stat.c fs: fix reporting supported extra file attributes for statx() 2021-05-11 14:47:33 +02:00
statfs.c
super.c
sync.c
timerfd.c
userfaultfd.c userfaultfd: do not untag user pointers 2021-07-28 14:35:46 +02:00
utimes.c
xattr.c