kernel_optimize_test/fs
Mike Rapoport df2cc96e77 userfaultfd: prevent non-cooperative events vs mcopy_atomic races
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of
the data in mcopy_atomic may happen either before of after the memory
mapping modifications and there is no way for the uffd monitor to
maintain consistent view of the process memory layout.

For instance, let's consider fork() running in parallel with
userfaultfd_copy():

process        		         |	uffd monitor
---------------------------------+------------------------------
fork()        		         | userfaultfd_copy()
...        		         | ...
    dup_mmap()        	         |     down_read(mmap_sem)
    down_write(mmap_sem)         |     /* create PTEs, copy data */
        dup_uffd()               |     up_read(mmap_sem)
        copy_page_range()        |
        up_write(mmap_sem)       |
        dup_uffd_complete()      |
            /* notify monitor */ |

If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
be present by the time copy_page_range() is called and they will appear
in the child's memory mappings.  However, if the fork() is the first to
take the mmap_sem, the new pages won't be mapped in the child's address
space.

If the pages are not present and child tries to access them, the monitor
will get page fault notification and everything is fine.  However, if
the pages *are present*, the child can access them without uffd
noticing.  And if we copy them into child it'll see the wrong data.
Since we are talking about background copy, we'd need to decide whether
the pages should be copied or not regardless #PF notifications.

Since userfaultfd monitor has no way to determine what was the order,
let's disallow userfaultfd_copy in parallel with the non-cooperative
events.  In such case we return -EAGAIN and the uffd monitor can
understand that userfaultfd_copy() clashed with a non-cooperative event
and take an appropriate action.

Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:38 -07:00
..
9p fs/9p: detect invalid options as much as possible 2018-06-07 17:34:34 -07:00
adfs adfs_lookup: do not fail with ENOENT on negatives, use d_splice_alias() 2018-05-22 14:27:56 -04:00
affs affs: fix potential memory leak when parsing option 'prefix' 2018-05-28 12:36:41 +02:00
afs Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
autofs4 autofs: mount point create should honour passed in mode 2018-04-20 17:18:35 -07:00
befs befs_lookup(): use d_splice_alias() 2018-05-21 14:30:07 -04:00
bfs bfs_add_entry: pass name/len as qstr pointer 2018-05-22 14:27:50 -04:00
btrfs for-4.18-tag 2018-06-04 14:29:13 -07:00
cachefiles Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
ceph ceph: fix iov_iter issues in ceph_direct_read_write() 2018-05-10 10:15:12 +02:00
cifs some smb3 fixes for stable, as well as addition of ftrace hooks for cifs.ko, and improvements in compounding and smbdirect (RDMA) 2018-06-04 14:42:46 -07:00
coda
configfs
cramfs cramfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
crypto fscrypt: log the crypto algorithm implementations 2018-05-20 16:36:00 -04:00
debugfs debugfs: inode: debugfs_create_dir uses mode permission from parent 2018-05-14 16:48:18 +02:00
devpts
dlm dlm: remove O_NONBLOCK flag in sctp_connect_to_sock 2018-05-29 10:48:35 -05:00
ecryptfs Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs into aio-base 2018-05-26 09:16:25 +02:00
efivarfs
efs
exofs scsi/osd: remove the gfp argument to osd_start_request 2018-05-14 08:55:09 -06:00
exportfs ovl: do not try to reconnect a disconnected origin dentry 2018-04-12 12:04:49 +02:00
ext2 Changes for 4.18: 2018-06-05 13:24:20 -07:00
ext4 Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
f2fs Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
fat vfat: simplify checks in vfat_lookup() 2018-05-13 12:09:14 -04:00
freevxfs freevxfs_lookup(): use d_splice_alias() 2018-05-22 14:27:51 -04:00
fscache proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fuse fuse update for 4.18 2018-06-07 08:50:57 -07:00
gfs2 Changes for 4.18: 2018-06-05 13:24:20 -07:00
hfs hfs: don't allow mounting over .../rsrc 2018-05-22 14:28:00 -04:00
hfsplus Merge branch 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 13:46:22 -07:00
hostfs hostfs: rename do_rmdir() to hostfs_do_rmdir() 2018-04-02 20:15:53 +02:00
hpfs
hugetlbfs hugetlbfs: fix bug in pgoff overflow checking 2018-04-05 21:36:21 -07:00
isofs isofs: fix potential memory leak in mount option parsing 2018-04-16 09:47:41 +02:00
jbd2 jbd2: remove NULL check before calling kmem_cache_destroy() 2018-05-20 22:38:26 -04:00
jffs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
jfs Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
kernfs Driver core changes for 4.18-rc1 2018-06-05 16:29:19 -07:00
lockd
minix minix_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
nfs proc: introduce proc_create_net{,_data} 2018-05-16 07:24:30 +02:00
nfs_common
nfsd for-4.18/block-20180603 2018-06-04 07:58:06 -07:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
nls
notify fsnotify: fix ignore mask logic in send_to_group() 2018-04-13 15:52:49 +02:00
ntfs
ocfs2 fs: ocfs2: use new return type vm_fault_t 2018-06-07 17:34:34 -07:00
omfs omfs_lookup(): report IO errors, use d_splice_alias() 2018-05-22 14:27:58 -04:00
openpromfs openpromfs: switch to d_splice_alias() 2018-05-22 14:27:57 -04:00
orangefs orangefs: fixes and cleanups 2018-06-07 09:23:12 -07:00
overlayfs ovl: use inode_insert5() to hash a newly created inode 2018-05-31 11:06:12 +02:00
proc mm: mark pages in use for page tables 2018-06-07 17:34:37 -07:00
pstore pstore: fix crypto dependencies without compression 2018-04-06 15:45:33 -07:00
qnx4 qnx4_lookup: use d_splice_alias() 2018-05-22 14:27:52 -04:00
qnx6 qnx6_lookup: switch to d_splice_alias() 2018-05-22 14:27:54 -04:00
quota fs: quota: Replace GFP_ATOMIC with GFP_KERNEL in dquot_init 2018-04-09 17:48:54 +02:00
ramfs
reiserfs Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:00:01 -07:00
romfs romfs_lookup: switch to d_splice_alias() 2018-05-22 14:27:55 -04:00
squashfs
sysfs unfuck sysfs_mount() 2018-05-21 14:30:09 -04:00
sysv sysv_lookup: use d_splice_alias() 2018-05-22 14:27:53 -04:00
tracefs
ubifs Add bunch of cleanups, and add support for the Speck128/256 2018-06-05 15:15:32 -07:00
udf \n 2018-06-07 09:36:29 -07:00
ufs do d_instantiate/unlock_new_inode combinations safely 2018-05-11 15:36:37 -04:00
xfs Changes for 4.18: 2018-06-05 13:24:20 -07:00
aio.c aio: sanitize the limit checking in io_submit(2) 2018-05-29 23:20:17 -04:00
anon_inodes.c
attr.c fs: Allow superblock owner to replace invalid owners of inodes 2018-05-24 11:57:18 -05:00
bad_inode.c
binfmt_aout.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf_fdpic.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_elf.c fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST error 2018-04-20 17:18:36 -07:00
binfmt_em86.c
binfmt_flat.c exec: introduce finalize_exec() before start_thread() 2018-04-11 10:28:37 -07:00
binfmt_misc.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
binfmt_script.c
block_dev.c fs: convert block_dev.c to bioset_init() 2018-05-30 15:33:32 -06:00
buffer.c fs: move page_cache_seek_hole_data to iomap.c 2018-06-01 18:37:33 -07:00
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c
coredump.c
d_path.c
dax.c fs/dax.c: use new return type vm_fault_t 2018-06-07 17:34:33 -07:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:14:28 -07:00
dcookies.c fs: add do_lookup_dcookie() helper; remove in-kernel call to syscall 2018-04-02 20:15:39 +02:00
direct-io.c block: consistently use GFP_NOIO instead of __GFP_NORECLAIM 2018-05-14 08:55:18 -06:00
drop_caches.c
eventfd.c eventfd: switch to ->poll_mask 2018-05-26 09:16:44 +02:00
eventpoll.c fs: add new vfs_poll and file_can_poll helpers 2018-05-26 09:16:44 +02:00
exec.c umh: introduce fork_usermode_blob() helper 2018-05-23 13:23:39 -04:00
fcntl.c mm: restructure memfd code 2018-06-07 17:34:35 -07:00
fhandle.c
file_table.c
file.c fs: add ksys_close() wrapper; remove in-kernel calls to sys_close() 2018-04-02 20:16:00 +02:00
filesystems.c proc: introduce proc_create_single{,_data} 2018-05-16 07:23:35 +02:00
fs_pin.c
fs_struct.c
fs-writeback.c bdi: Fix oops in wb_workfn() 2018-05-03 16:11:37 -06:00
inode.c overlayfs fixes for 4.18 2018-06-07 08:53:50 -07:00
internal.h Revert "fs: fold open_check_o_direct into do_dentry_open" 2018-06-03 10:58:23 -07:00
ioctl.c fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems 2018-05-24 12:04:28 -05:00
iomap.c fs: use ->is_partially_uptodate in page_cache_seek_hole_data 2018-06-01 18:37:33 -07:00
Kconfig mm: restructure memfd code 2018-06-07 17:34:35 -07:00
Kconfig.binfmt
libfs.c
locks.c proc: introduce proc_create_seq_private 2018-05-16 07:23:35 +02:00
Makefile
mbcache.c
mount.h
mpage.c
namei.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-06-04 15:21:19 -07:00
namespace.c fs: Allow superblock owner to access do_remount_sb() 2018-05-24 12:02:25 -05:00
no-block.c
nsfs.c
open.c Revert "fs: fold open_check_o_direct into do_dentry_open" 2018-06-03 10:58:23 -07:00
pipe.c pipe: convert to ->poll_mask 2018-05-26 09:16:44 +02:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range() 2018-04-15 23:36:26 -04:00
readdir.c fs: add ksys_getdents64() helper; remove in-kernel calls to sys_getdents64() 2018-04-02 20:16:02 +02:00
select.c fs: introduce new ->get_poll_head and ->poll_mask methods 2018-05-26 09:16:44 +02:00
seq_file.c proc: fix smaps and meminfo alignment 2018-05-25 18:12:11 -07:00
signalfd.c signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR} 2018-04-26 19:51:14 -05:00
splice.c fs: add do_vmsplice() helper; remove in-kernel call to syscall 2018-04-02 20:15:40 +02:00
stack.c
stat.c fs: add do_readlinkat() helper; remove internal call to sys_readlinkat() 2018-04-02 20:15:34 +02:00
statfs.c
super.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2018-06-04 10:14:28 -07:00
sync.c Changes for this release: 2018-04-04 12:44:02 -07:00
timerfd.c timerfd: convert to ->poll_mask 2018-05-26 09:16:44 +02:00
userfaultfd.c userfaultfd: prevent non-cooperative events vs mcopy_atomic races 2018-06-07 17:34:38 -07:00
utimes.c fs: add do_compat_futimesat() helper; remove in-kernel call to compat syscall 2018-04-02 20:15:44 +02:00
xattr.c vfs: delete unnecessary assignment in vfs_listxattr 2018-05-29 13:22:41 -04:00