kernel_optimize_test

Author	SHA1	Message	Date
Miklos Szeredi	d719e8f268	ovl: update atime on upper Fix atime update logic in overlayfs. This patch adds an i_op->update_time() handler to overlayfs inodes. This forwards atime updates to the upper layer only. No atime updates are done on lower layers. Remove implicit atime updates to underlying files and directories with O_NOATIME. Remove explicit atime update in ovl_readlink(). Clear atime related mnt flags from cloned upper mount. This means atime updates are controlled purely by overlayfs mount options. Reported-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Miklos Szeredi	bb0d2b8ad2	ovl: fix sgid on directory When creating directory in workdir, the group/sgid inheritance from the parent dir was omitted completely. Fix this by calling inode_init_owner() on overlay inode and using the resulting uid/gid/mode to create the file. Unfortunately the sgid bit can be stripped off due to umask, so need to reset the mode in this case in workdir before moving the directory in place. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Miklos Szeredi	9c630ebefe	ovl: simplify permission checking The fact that we always do permission checking on the overlay inode and clear MAY_WRITE for checking access to the lower inode allows cruft to be removed from ovl_permission(). 1) "default_permissions" option effectively did generic_permission() on the overlay inode with i_mode, i_uid and i_gid updated from underlying filesystem. This is what we do by default now. It did the update using vfs_getattr() but that's only needed if the underlying filesystem can change (which is not allowed). We may later introduce a "paranoia_mode" that verifies that mode/uid/gid are not changed. 2) splitting out the IS_RDONLY() check from inode_permission() also becomes unnecessary once we remove the MAY_WRITE from the lower inode check. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Vivek Goyal	754f8cb72b	ovl: do not require mounter to have MAY_WRITE on lower Now we have two levels of checks in ovl_permission(). overlay inode is checked with the creds of task while underlying inode is checked with the creds of mounter. Looks like mounter does not have to have WRITE access to files on lower/. So remove the MAY_WRITE from access mask for checks on underlying lower inode. This means task should still have the MAY_WRITE permission on lower inode and mounter is not required to have MAY_WRITE. It also solves the problem of read only NFS mounts being used as lower. If __inode_permission(lower_inode, MAY_WRITE) is called on read only NFS, it fails. By resetting MAY_WRITE, check succeeds and case of read only NFS shold work with overlay without having to specify any special mount options (default permission). Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Vivek Goyal	1175b6b8d9	ovl: do operations on underlying file system in mounter's context Given we are now doing checks both on overlay inode as well underlying inode, we should be able to do checks and operations on underlying file system using mounter's context. So modify all operations to do checks/operations on underlying dentry/inode in the context of mounter. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Vivek Goyal	c0ca3d70e8	ovl: modify ovl_permission() to do checks on two inodes Right now ovl_permission() calls __inode_permission(realinode), to do permission checks on real inode and no checks are done on overlay inode. Modify it to do checks both on overlay inode as well as underlying inode. Checks on overlay inode will be done with the creds of calling task while checks on underlying inode will be done with the creds of mounter. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Vivek Goyal	39a25b2b37	ovl: define ->get_acl() for overlay inodes Now we are planning to do DAC permission checks on overlay inode itself. And to make it work, we will need to make sure we can get acls from underlying inode. So define ->get_acl() for overlay inodes and this in turn calls into underlying filesystem to get acls, if any. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Vivek Goyal	72e4848181	ovl: move some common code in a function ovl_create_upper() and ovl_create_over_whiteout() seem to be sharing some common code which can be moved into a separate function. No functionality change. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:23 +02:00
Andreas Gruenbacher	58ed4e70f2	ovl: store ovl_entry in inode->i_private for all inodes Previously this was only done for directory inodes. Doing so for all inodes makes for a nice cleanup in ovl_permission at zero cost. Inodes are not shared for hard links on the overlay, so this works fine. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:22 +02:00
Miklos Szeredi	eead4f2dc4	ovl: use generic_delete_inode No point in keeping overlay inodes around since they will never be reused. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-07-29 12:05:22 +02:00
Miklos Szeredi	c1b2cc1a76	ovl: check mounter creds on underlying lookup The hash salting changes meant that we can no longer reuse the hash in the overlay dentry to look up the underlying dentry. Instead of lookup_hash(), use lookup_one_len_unlocked() and swith to mounter's creds (like we do for all other operations later in the series). Now the lookup_hash() export introduced in 4.6 by `3c9fe8cdff` ("vfs: add lookup_hash() helper") is unused and can possibly be removed; its usefulness negated by the hash salting and the idea that mounter's creds should be used on operations on underlying filesystems. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `8387ff2577` ("vfs: make the string hashes salt the hash")	2016-07-29 12:05:22 +02:00
Miklos Szeredi	1b91dbdd29	Merge branch 'd_real' into overlayfs-next	2016-07-27 11:36:03 +02:00
Maxim Patlasov	cfc9fde0b0	ovl: verify upper dentry in ovl_remove_and_whiteout() The upper dentry may become stale before we call ovl_lock_rename_workdir. For example, someone could (mistakenly or maliciously) manually unlink(2) it directly from upperdir. To ensure it is not stale, let's lookup it after ovl_lock_rename_workdir and and check if it matches the upper dentry. Essentially, it is the same problem and similar solution as in commit `11f3710417` ("ovl: verify upper dentry before unlink and rename"). Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>	2016-07-22 10:54:20 +02:00
Vivek Goyal	07a2daab49	ovl: Copy up underlying inode's ->i_mode to overlay inode Right now when a new overlay inode is created, we initialize overlay inode's ->i_mode from underlying inode ->i_mode but we retain only file type bits (S_IFMT) and discard permission bits. This patch changes it and retains permission bits too. This should allow overlay to do permission checks on overlay inode itself in task context. [SzM] It also fixes clearing suid/sgid bits on write. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>	2016-07-04 16:49:48 +02:00
Miklos Szeredi	b99c2d9138	ovl: handle ATTR_KILL* Before `4bacc9c923` ("overlayfs: Make f_path...") file->f_path pointed to the underlying file, hence suid/sgid removal on write worked fine. After that patch file->f_path pointed to the overlay file, and the file mode bits weren't copied to overlay_inode->i_mode. So the suid/sgid removal simply stopped working. The fix is to copy the mode bits, but then ovl_setattr() needs to clear ATTR_MODE to avoid the BUG() in notify_change(). So do this first, then in the next patch copy the mode. Reported-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Cc: <stable@vger.kernel.org>	2016-07-04 16:49:48 +02:00
Vivek Goyal	e7c0b5991d	ovl: warn instead of error if d_type is not supported overlay needs underlying fs to support d_type. Recently I put in a patch in to detect this condition and started failing mount if underlying fs did not support d_type. But this breaks existing configurations over kernel upgrade. Those who are running docker (partially broken configuration) with xfs not supporting d_type, are surprised that after kernel upgrade docker does not run anymore. https://github.com/docker/docker/issues/22937#issuecomment-229881315 So instead of erroring out, detect broken configuration and warn about it. This should allow existing docker setups to continue working after kernel upgrade. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `45aebeaf4f` ("ovl: Ensure upper filesystem supports d_type") Cc: <stable@vger.kernel.org> 4.6	2016-07-03 09:39:31 +02:00
Miklos Szeredi	2d902671ce	vfs: merge .d_select_inode() into .d_real() The two methods essentially do the same: find the real dentry/inode belonging to an overlay dentry. The difference is in the usage: vfs_open() uses ->d_select_inode() and expects the function to perform copy-up if necessary based on the open flags argument. file_dentry() uses ->d_real() passing in the overlay dentry as well as the underlying inode. vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real() with a zero inode would have worked just as well here. This patch merges the functionality of ->d_select_inode() into ->d_real() by adding an 'open_flags' argument to the latter. [Al Viro] Make the signature of d_real() match that of ->d_real() again. And constify the inode argument, while we are at it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-06-30 08:53:27 +02:00
Miklos Szeredi	03bea60409	ovl: get_write_access() in truncate When truncating a file we should check write access on the underlying inode. And we should do so on the lower file as well (before copy-up) for consistency. Original patch and test case by Aihua Zhang. - - >o >o - - test.c - - >o >o - - #include <stdio.h> #include <errno.h> #include <unistd.h> int main(int argc, char *argv[]) { int ret; ret = truncate(argv[0], 4096); if (ret != -1) { fprintf(stderr, "truncate(argv[0]) should have failed\n"); return 1; } if (errno != ETXTBSY) { perror("truncate(argv[0])"); return 1; } return 0; } - - >o >o - - >o >o - - >o >o - - Reported-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>	2016-06-29 16:03:55 +02:00
Miklos Szeredi	a4859d7594	ovl: fix dentry leak for default_permissions When using the 'default_permissions' mount option, ovl_permission() on non-directories was missing a dput(alias), resulting in "BUG Dentry still in use". Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `8d3095f4ad` ("ovl: default permissions") Cc: <stable@vger.kernel.org> # v4.5+	2016-06-29 08:26:59 +02:00
Miklos Szeredi	d0e13f5bbe	ovl: fix uid/gid when creating over whiteout Fix a regression when creating a file over a whiteout. The new file/directory needs to use the current fsuid/fsgid, not the ones from the mounter's credentials. The refcounting is a bit tricky: prepare_creds() sets an original refcount, override_creds() gets one more, which revert_cred() drops. So 1) we need to expicitly put the mounter's credentials when overriding with the updated one 2) we need to put the original ref to the updated creds (and this can safely be done before revert_creds(), since we'll still have the ref from override_creds()). Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Fixes: `3fe6e52f06` ("ovl: override creds with the ones from the superblock mounter") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-06-15 14:18:59 +02:00
Miklos Szeredi	b581755b1c	ovl: xattr filter fix a) ovl_need_xattr_filter() is wrong, we can have multiple lower layers overlaid, all of which (except the lowest one) honouring the "trusted.overlay.opaque" xattr. So need to filter everything except the bottom and the pure-upper layer. b) we no longer can assume that inode is attached to dentry in get/setxattr. This patch unconditionally filters private xattrs to fix both of the above. Performance impact for get/removexattrs is likely in the noise. For listxattrs it might be measurable in pathological cases, but I very much hope nobody cares. If they do, we'll fix it then. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: `b96809173e` ("security_d_instantiate(): move to the point prior to attaching dentry to inode")	2016-06-06 16:21:37 +02:00
Linus Torvalds	d102a56edb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking	2016-05-27 17:14:05 -07:00
Al Viro	3767e255b3	switch ->setxattr() to passing dentry and inode separately smack ->d_instantiate() uses ->setxattr(), so to be able to call it before we'd hashed the new dentry and attached it to inode, we need ->setxattr() instances getting the inode as an explicit argument rather than obtaining it from dentry. Similar change for ->getxattr() had been done in commit `ce23e64`. Unlike ->getxattr() (which is used by both selinux and smack instances of ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately it got missed back then. Reported-by: Seung-Woo Kim <sw0312.kim@samsung.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-27 20:09:16 -04:00
Linus Torvalds	0121a32201	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs update from Miklos Szeredi: "The meat of this is a change to use the mounter's credentials for operations that require elevated privileges (such as whiteout creation). This fixes behavior under user namespaces as well as being a nice cleanup" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: Do d_type check only if work dir creation was successful ovl: update documentation ovl: override creds with the ones from the superblock mounter	2016-05-27 16:44:39 -07:00
Vivek Goyal	21765194ce	ovl: Do d_type check only if work dir creation was successful d_type check requires successful creation of workdir as iterates through work dir and expects work dir to be present in it. If that's not the case, this check will always return d_type not supported even if underlying filesystem might be supporting it. So don't do this check if work dir creation failed in previous step. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-05-27 10:18:56 +02:00
Antonio Murdaca	3fe6e52f06	ovl: override creds with the ones from the superblock mounter In user namespace the whiteout creation fails with -EPERM because the current process isn't capable(CAP_SYS_ADMIN) when setting xattr. A simple reproducer: $ mkdir upper lower work merged lower/dir $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir=work merged $ unshare -m -p -f -U -r bash Now as root in the user namespace: \# touch merged/dir/{1,2,3} # this will force a copy up of lower/dir \# rm -fR merged/* This ends up failing with -EPERM after the files in dir has been correctly deleted: unlinkat(4, "2", 0) = 0 unlinkat(4, "1", 0) = 0 unlinkat(4, "3", 0) = 0 close(4) = 0 unlinkat(AT_FDCWD, "merged/dir", AT_REMOVEDIR) = -1 EPERM (Operation not permitted) Interestingly, if you don't place files in merged/dir you can remove it, meaning if upper/dir does not exist, creating the char device file works properly in that same location. This patch uses ovl_sb_creator_cred() to get the cred struct from the superblock mounter and override the old cred with these new ones so that the whiteout creation is possible because overlay is wrong in assuming that the creds it will get with prepare_creds will be in the initial user namespace. The old cap_raise game is removed in favor of just overriding the old cred struct. This patch also drops from ovl_copy_up_one() the following two lines: override_cred->fsuid = stat->uid; override_cred->fsgid = stat->gid; This is because the correct uid and gid are taken directly with the stat struct and correctly set with ovl_set_attr(). Signed-off-by: Antonio Murdaca <runcom@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-05-27 08:55:26 +02:00
Al Viro	002354112f	restore killability of old mutex_lock_killable(&inode->i_mutex) users The ones that are taking it exclusive, that is... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-26 00:13:25 -04:00
Al Viro	0e0162bb8c	Merge branch 'ovl-fixes' into for-linus Backmerge to resolve a conflict in ovl_lookup_real(); "ovl_lookup_real(): use lookup_one_len_unlocked()" instead, but it was too late in the cycle to rebase.	2016-05-17 02:17:59 -04:00
Miklos Szeredi	38b78a5f18	ovl: ignore permissions on underlying lookup Generally permission checking is not necessary when overlayfs looks up a dentry on one of the underlying layers, since search permission on base directory was already checked in ovl_permission(). More specifically using lookup_one_len() causes a problem when the lower directory lacks search permission for a specific user while the upper directory does have search permission. Since lookups are cached, this causes inconsistency in behavior: success depends on who did the first lookup. So instead use lookup_hash() which doesn't do the permission check. Reported-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-05-10 23:58:18 -04:00
Al Viro	9902af79c0	parallel lookups: actual switch to rwsem ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:49:28 -04:00
Al Viro	b9e1d435fd	ovl_lookup_real(): use lookup_one_len_unlocked() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-05-02 19:47:24 -04:00
Al Viro	84695ffee7	Merge getxattr prototype change into work.lookups The rest of work.xattr stuff isn't needed for this branch	2016-05-02 19:45:47 -04:00
Al Viro	ce23e64013	->getxattr(): pass dentry and inode as separate arguments Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-04-11 00:48:00 -04:00
Miklos Szeredi	d101a12595	fs: add file_dentry() This series fixes bugs in nfs and ext4 due to `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay"). Regular files opened on overlayfs will result in the file being opened on the underlying filesystem, while f_path points to the overlayfs mount/dentry. This confuses filesystems which get the dentry from struct file and assume it's theirs. Add a new helper, file_dentry() [], to get the filesystem's own dentry from the file. This checks file->f_path.dentry->d_flags against DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not set (this is the common, non-overlayfs case). In the uncommon case it will call into overlayfs's ->d_real() to get the underlying dentry, matching file_inode(file). The reason we need to check against the inode is that if the file is copied up while being open, d_real() would return the upper dentry, while the open file comes from the lower dentry. [] If possible, it's better simply to use file_inode() instead. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com> Cc: <stable@vger.kernel.org> # v4.2 Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Daniel Axtens <dja@axtens.net>	2016-03-26 16:14:37 -04:00
Miklos Szeredi	6986c012fa	ovl: cleanup unused var in rename2 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:46 +01:00
Miklos Szeredi	56656e960b	ovl: rename is_merge to is_lowest The 'is_merge' is an historical naming from when only a single lower layer could exist. With the introduction of multiple lower layers the meaning of this flag was changed to mean only the "lowest layer" (while all lower layers were being merged). So now 'is_merge' is inaccurate and hence renaming to 'is_lowest' Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:46 +01:00
Sohom Bhattacharjee	f134f24465	ovl: fixed coding style warning This patch fixes a newline warning found by the checkpatch.pl tool Signed-off-by: Sohom-Bhattacharjee <soham.bhattacharjee15@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Vivek Goyal	45aebeaf4f	ovl: Ensure upper filesystem supports d_type In some instances xfs has been created with ftype=0 and there if a file on lower fs is removed, overlay leaves a whiteout in upper fs but that whiteout does not get filtered out and is visible to overlayfs users. And reason it does not get filtered out because upper filesystem does not report file type of whiteout as DT_CHR during iterate_dir(). So it seems to be a requirement that upper filesystem support d_type for overlayfs to work properly. Do this check during mount and fail if d_type is not supported. Suggested-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
David Howells	fb5bb2c3b7	ovl: Warn on copy up if a process has a R/O fd open to the lower file Print a warning when overlayfs copies up a file if the process that triggered the copy up has a R/O fd open to the lower file being copied up. This can help catch applications that do things like the following: fd1 = open("foo", O_RDONLY); fd2 = open("foo", O_RDWR); where they expect fd1 and fd2 to refer to the same file - which will no longer be the case post-copy up. With this patch, the following commands: bash 5</mnt/a/foo128 6<>/mnt/a/foo128 assuming /mnt/a/foo128 to be an un-copied up file on an overlay will produce the following warning in the kernel log: overlayfs: Copying up foo129, but open R/O on fd 5 which will cease to be coherent [pid=3818 bash] This is enabled by setting: /sys/module/overlay/parameters/check_copy_up to 1. The warnings are ratelimited and are also limited to one warning per file - assuming the copy up completes in each case. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Konstantin Khlebnikov	07f2af7bfd	ovl: honor flag MS_SILENT at mount This patch hides error about missing lowerdir if MS_SILENT is set. We use mount(NULL, "/", "overlay", MS_SILENT, NULL) for testing support of overlayfs: syscall returns -ENODEV if it's not supported. Otherwise kernel automatically loads module and returns -EINVAL because lowerdir is missing. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:45 +01:00
Miklos Szeredi	11f3710417	ovl: verify upper dentry before unlink and rename Unlink and rename in overlayfs checked the upper dentry for staleness by verifying upper->d_parent against upperdir. However the dentry can go stale also by being unhashed, for example. Expand the verification to actually look up the name again (under parent lock) and check if it matches the upper dentry. This matches what the VFS does before passing the dentry to filesytem's unlink/rename methods, which excludes any inconsistency caused by overlayfs. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2016-03-21 17:31:44 +01:00
Konstantin Khlebnikov	b81de061fa	ovl: copy new uid/gid into overlayfs runtime inode Overlayfs must update uid/gid after chown, otherwise functions like inode_owner_or_capable() will check user against stale uid. Catched by xfstests generic/087, it chowns file and calls utimes. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2016-03-03 17:17:46 +01:00
Konstantin Khlebnikov	45d1173896	ovl: ignore lower entries when checking purity of non-directory entries After rename file dentry still holds reference to lower dentry from previous location. This doesn't matter for data access because data comes from upper dentry. But this stale lower dentry taints dentry at new location and turns it into non-pure upper. Such file leaves visible whiteout entry after remove in directory which shouldn't have whiteouts at all. Overlayfs already tracks pureness of file location in oe->opaque. This patch just uses that for detecting actual path type. Comment from Vivek Goyal's patch: Here are the details of the problem. Do following. $ mkdir upper lower work merged upper/dir/ $ touch lower/test $ sudo mount -t overlay overlay -olowerdir=lower,upperdir=upper,workdir= work merged $ mv merged/test merged/dir/ $ rm merged/dir/test $ ls -l merged/dir/ /usr/bin/ls: cannot access merged/dir/test: No such file or directory total 0 c????????? ? ? ? ? ? test Basic problem seems to be that once a file has been unlinked, a whiteout has been left behind which was not needed and hence it becomes visible. Whiteout is visible because parent dir is of not type MERGE, hence od->is_real is set during ovl_dir_open(). And that means ovl_iterate() passes on iterate handling directly to underlying fs. Underlying fs does not know/filter whiteouts so it becomes visible to user. Why did we leave a whiteout to begin with when we should not have. ovl_do_remove() checks for OVL_TYPE_PURE_UPPER() and does not leave whiteout if file is pure upper. In this case file is not found to be pure upper hence whiteout is left. So why file was not PURE_UPPER in this case? I think because dentry is still carrying some leftover state which was valid before rename. For example, od->numlower was set to 1 as it was a lower file. After rename, this state is not valid anymore as there is no such file in lower. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Viktor Stanchev <me@viktorstanchev.com> Suggested-by: Vivek Goyal <vgoyal@redhat.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=109611 Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2016-03-03 17:17:45 +01:00
Rui Wang	ce9113bbcb	ovl: fix getcwd() failure after unsuccessful rmdir ovl_remove_upper() should do d_drop() only after it successfully removes the dir, otherwise a subsequent getcwd() system call will fail, breaking userspace programs. This is to fix: https://bugzilla.kernel.org/show_bug.cgi?id=110491 Signed-off-by: Rui Wang <rui.y.wang@intel.com> Reviewed-by: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2016-03-03 17:17:45 +01:00
Konstantin Khlebnikov	b5891cfab0	ovl: fix working on distributed fs as lower layer This adds missing .d_select_inode into alternative dentry_operations. Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Fixes: `7c03b5d45b` ("ovl: allow distributed fs as lower layer") Fixes: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay") Reviewed-by: Nikolay Borisov <kernel@kyup.com> Tested-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # 4.2+	2016-03-03 17:17:45 +01:00
Al Viro	5955102c99	wrappers for ->i_mutex access parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2016-01-22 18:04:28 -05:00
Linus Torvalds	eae21770b4	Merge branch 'akpm' (patches from Andrew) Merge third patch-bomb from Andrew Morton: "I'm pretty much done for -rc1 now: - the rest of MM, basically - lib/ updates - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit - cpu_mask simplifications - kexec, rapidio, MAINTAINERS etc, etc. - more dma-mapping cleanups/simplifications from hch" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (109 commits) MAINTAINERS: add/fix git URLs for various subsystems mm: memcontrol: add "sock" to cgroup2 memory.stat mm: memcontrol: basic memory statistics in cgroup2 memory controller mm: memcontrol: do not uncharge old page in page cache replacement Documentation: cgroup: add memory.swap.{current,max} description mm: free swap cache aggressively if memcg swap is full mm: vmscan: do not scan anon pages if memcg swap limit is hit swap.h: move memcg related stuff to the end of the file mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online mm: vmscan: pass memcg to get_scan_count() mm: memcontrol: charge swap to cgroup2 mm: memcontrol: clean up alloc, online, offline, free functions mm: memcontrol: flatten struct cg_proto mm: memcontrol: rein in the CONFIG space madness net: drop tcp_memcontrol.c mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM mm: memcontrol: allow to disable kmem accounting for cgroup2 mm: memcontrol: account "kmem" consumers in cgroup2 memory controller mm: memcontrol: move kmem accounting code to CONFIG_MEMCG mm: memcontrol: separate kmem code from legacy tcp accounting code ...	2016-01-21 12:32:08 -08:00
Linus Torvalds	e9f57ebcba	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This contains several bug fixes and a new mount option 'default_permissions' that allows read-only exported NFS filesystems to be used as lower layer" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: check dentry positiveness in ovl_cleanup_whiteouts() ovl: setattr: check permissions before copy-up ovl: root: copy attr ovl: move super block magic number to magic.h ovl: use a minimal buffer in ovl_copy_xattr ovl: allow zero size xattr ovl: default permissions	2016-01-21 12:20:46 -08:00
Andrew Morton	e458bcd16f	fs/overlayfs/super.c needs pagemap.h i386 allmodconfig: In file included from fs/overlayfs/super.c:10:0: fs/overlayfs/super.c: In function 'ovl_fill_super': include/linux/fs.h:898:36: error: 'PAGE_CACHE_SIZE' undeclared (first use in this function) #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) ^ fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE' sb->s_maxbytes = MAX_LFS_FILESIZE; ^ include/linux/fs.h:898:36: note: each undeclared identifier is reported only once for each function it appears in #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) ^ fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE' sb->s_maxbytes = MAX_LFS_FILESIZE; ^ Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-20 17:09:18 -08:00
Al Viro	fceef393a5	switch ->get_link() to delayed_call, kill ->put_link() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-30 13:01:03 -05:00
Konstantin Khlebnikov	84889d4933	ovl: check dentry positiveness in ovl_cleanup_whiteouts() This patch fixes kernel crash at removing directory which contains whiteouts from lower layers. Cache of directory content passed as "list" contains entries from all layers, including whiteouts from lower layers. So, lookup in upper dir (moved into work at this stage) will return negative entry. Plus this cache is filled long before and we can race with external removal. Example: mkdir -p lower0/dir lower1/dir upper work overlay touch lower0/dir/a lower0/dir/b mknod lower1/dir/a c 0 0 mount -t overlay none overlay -o lowerdir=lower1:lower0,upperdir=upper,workdir=work rm -fr overlay/dir Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # 3.18+	2015-12-11 16:30:49 +01:00
Miklos Szeredi	cf9a6784f7	ovl: setattr: check permissions before copy-up Without this copy-up of a file can be forced, even without actually being allowed to do anything on the file. [Arnd Bergmann] include <linux/pagemap.h> for PAGE_CACHE_SIZE (used by MAX_LFS_FILESIZE definition). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2015-12-11 16:30:49 +01:00
Miklos Szeredi	ed06e06977	ovl: root: copy attr We copy i_uid and i_gid of underlying inode into overlayfs inode. Except for the root inode. Fix this omission. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2015-12-09 16:11:59 +01:00
Al Viro	6b2553918d	replace ->follow_link() with new method that could stay in RCU mode new method: ->get_link(); replacement of ->follow_link(). The differences are: * inode and dentry are passed separately * might be called both in RCU and non-RCU mode; the former is indicated by passing it a NULL dentry. * when called that way it isn't allowed to block and should return ERR_PTR(-ECHILD) if it needs to be called in non-RCU mode. It's a flagday change - the old method is gone, all in-tree instances converted. Conversion isn't hard; said that, so far very few instances do not immediately bail out when called in RCU mode. That'll change in the next commits. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-08 22:41:54 -05:00
Al Viro	0f7ff2dabb	ovl: get rid of the dead code left from broken (and disabled) optimizations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 12:31:07 -05:00
Miklos Szeredi	acff81ec2c	ovl: fix permission checking for setattr [Al Viro] The bug is in being too enthusiastic about optimizing ->setattr() away - instead of "copy verbatim with metadata" + "chmod/chown/utimes" (with the former being always safe and the latter failing in case of insufficient permissions) it tries to combine these two. Note that copyup itself will have to do ->setattr() anyway; _that_ is where the elevated capabilities are right. Having these two ->setattr() (one to set verbatim copy of metadata, another to do what overlayfs ->setattr() had been asked to do in the first place) combined is where it breaks. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-12-06 12:28:23 -05:00
Stephen Hemminger	257f871993	ovl: move super block magic number to magic.h The overlayfs file system is not recognized by programs like tail because the magic number is not in standard header location. Move it so that the value will propagate on for the GNU library and utilities. Needs to go in the fstatfs manual page as well. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2015-11-10 17:13:35 +01:00
Vito Caputo	e4ad29fa0d	ovl: use a minimal buffer in ovl_copy_xattr Rather than always allocating the high-order XATTR_SIZE_MAX buffer which is costly and prone to failure, only allocate what is needed and realloc if necessary. Fixes https://github.com/coreos/bugs/issues/489 Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2015-11-10 17:08:42 +01:00
Miklos Szeredi	97daf8b97a	ovl: allow zero size xattr When ovl_copy_xattr() encountered a zero size xattr no more xattrs were copied and the function returned success. This is clearly not the desired behavior. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org>	2015-11-10 17:08:41 +01:00
Linus Torvalds	4bb0fb57f3	Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs bug fixes from Miklos Szeredi: "This contains fixes for bugs that appeared in earlier kernels (all are marked for -stable)" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: free lower_mnt array in ovl_put_super ovl: free stack of paths in ovl_fill_super ovl: fix open in stacked overlay ovl: fix dentry reference leak ovl: use O_LARGEFILE in ovl_copy_up()	2015-10-31 14:49:19 -07:00
Miklos Szeredi	8d3095f4ad	ovl: default permissions Add mount option "default_permissions" to alter the way permissions are calculated. Without this option and prior to this patch permissions were calculated by underlying lower or upper filesystem. With this option the permissions are calculated by overlayfs based on the file owner, group and mode bits. This has significance for example when a read-only exported NFS filesystem is used as a lower layer. In this case the underlying NFS filesystem will reply with EROFS, in which case all we know is that the filesystem is read-only. But that's not what we are interested in, we are interested in whether the access would be allowed if the filesystem wasn't read-only; the server doesn't tell us that, and would need updating at various levels, which doesn't seem practicable. Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>	2015-10-12 17:11:44 +02:00
Konstantin Khlebnikov	5ffdbe8bf1	ovl: free lower_mnt array in ovl_put_super This fixes memory leak after umount. Kmemleak report: unreferenced object 0xffff8800ba791010 (size 8): comm "mount", pid 2394, jiffies 4294996294 (age 53.920s) hex dump (first 8 bytes): 20 1c 13 02 00 88 ff ff ....... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa0152bfc>] ovl_fill_super+0x55c/0x9b0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa0152118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: `dd662667e6` ("ovl: add mutli-layer infrastructure") Cc: <stable@vger.kernel.org> # v4.0+	2015-10-12 17:11:44 +02:00
Konstantin Khlebnikov	0f95502ad8	ovl: free stack of paths in ovl_fill_super This fixes small memory leak after mount. Kmemleak report: unreferenced object 0xffff88003683fe00 (size 16): comm "mount", pid 2029, jiffies 4294909563 (age 33.380s) hex dump (first 16 bytes): 20 27 1f bb 00 88 ff ff 40 4b 0f 36 02 88 ff ff '......@K.6.... backtrace: [<ffffffff811f8cd4>] create_object+0x124/0x2c0 [<ffffffff817a059b>] kmemleak_alloc+0x7b/0xc0 [<ffffffff811dffe6>] __kmalloc+0x106/0x340 [<ffffffffa01b7a29>] ovl_fill_super+0x389/0x9a0 [overlay] [<ffffffff81200ac4>] mount_nodev+0x54/0xa0 [<ffffffffa01b7118>] ovl_mount+0x18/0x20 [overlay] [<ffffffff81201ab3>] mount_fs+0x43/0x170 [<ffffffff81220d34>] vfs_kern_mount+0x74/0x170 [<ffffffff812233ad>] do_mount+0x22d/0xdf0 [<ffffffff812242cb>] SyS_mount+0x7b/0xc0 [<ffffffff817b6bee>] entry_SYSCALL_64_fastpath+0x12/0x76 [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: `a78d9f0d5d` ("ovl: support multiple lower layers") Cc: <stable@vger.kernel.org> # v4.0+	2015-10-12 17:11:43 +02:00
Miklos Szeredi	1c8a47df36	ovl: fix open in stacked overlay If two overlayfs filesystems are stacked on top of each other, then we need recursion in ovl_d_select_inode(). I guess d_backing_inode() is supposed to do that. But currently it doesn't and that functionality is open coded in vfs_open(). This is now copied into ovl_d_select_inode() to fix this regression. Reported-by: Alban Crequy <alban.crequy@gmail.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Fixes: `4bacc9c923` ("overlayfs: Make f_path always point to the overlay...") Cc: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> # v4.2+	2015-10-12 15:56:20 +02:00
David Howells	ab79efab0a	ovl: fix dentry reference leak In ovl_copy_up_locked(), newdentry is leaked if the function exits through out_cleanup as this just to out after calling ovl_cleanup() - which doesn't actually release the ref on newdentry. The out_cleanup segment should instead exit through out2 as certainly newdentry leaks - and possibly upper does also, though this isn't caught given the catch of newdentry. Without this fix, something like the following is seen: BUG: Dentry ffff880023e9eb20{i=f861,n=#ffff880023e82d90} still in use (1) [unmount of tmpfs tmpfs] BUG: Dentry ffff880023ece640{i=0,n=bigfile} still in use (1) [unmount of tmpfs tmpfs] when unmounting the upper layer after an error occurred in copyup. An error can be induced by creating a big file in a lower layer with something like: dd if=/dev/zero of=/lower/a/bigfile bs=65536 count=1 seek=$((0xf000)) to create a large file (4.1G). Overlay an upper layer that is too small (on tmpfs might do) and then induce a copy up by opening it writably. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+	2015-10-12 15:56:20 +02:00
David Howells	0480334fa6	ovl: use O_LARGEFILE in ovl_copy_up() Open the lower file with O_LARGEFILE in ovl_copy_up(). Pass O_LARGEFILE unconditionally in ovl_copy_up_data() as it's purely for catching 32-bit userspace dealing with a file large enough that it'll be mishandled if the application isn't aware that there might be an integer overflow. Inside the kernel, there shouldn't be any problems. Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Cc: <stable@vger.kernel.org> # v3.18+	2015-10-12 15:56:20 +02:00
Kees Cook	a068acf2ee	fs: create and use seq_show_option for escaping Many file systems that implement the show_options hook fail to correctly escape their output which could lead to unescaped characters (e.g. new lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files. This could lead to confusion, spoofed entries (resulting in things like systemd issuing false d-bus "mount" notifications), and who knows what else. This looks like it would only be the root user stepping on themselves, but it's possible weird things could happen in containers or in other situations with delegated mount privileges. Here's an example using overlay with setuid fusermount trusting the contents of /proc/mounts (via the /etc/mtab symlink). Imagine the use of "sudo" is something more sneaky: $ BASE="ovl" $ MNT="$BASE/mnt" $ LOW="$BASE/lower" $ UP="$BASE/upper" $ WORK="$BASE/work/ 0 0 none /proc fuse.pwn user_id=1000" $ mkdir -p "$LOW" "$UP" "$WORK" $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt $ cat /proc/mounts none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0 none /proc fuse.pwn user_id=1000 0 0 $ fusermount -u /proc $ cat /proc/mounts cat: /proc/mounts: No such file or directory This fixes the problem by adding new seq_show_option and seq_show_option_n helpers, and updating the vulnerable show_option handlers to use them as needed. Some, like SELinux, need to be open coded due to unusual existing escape mechanisms. [akpm@linux-foundation.org: add lost chunk, per Kees] [keescook@chromium.org: seq_show_option should be using const parameters] Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Jan Kara <jack@suse.com> Acked-by: Paul Moore <paul@paul-moore.com> Cc: J. R. Okajima <hooanon05g@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-09-04 16:54:41 -07:00
Al Viro	9391dd00d1	fix a braino in ovl_d_select_inode() when opening a directory we want the overlayfs inode, not one from the topmost layer. Reported-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Tested-By: Andrey Jr. Melnikov <temnota.am@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-07-12 11:22:05 -04:00
Linus Torvalds	1dc51b8288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...	2015-07-04 19:36:06 -07:00
Linus Torvalds	320cd413fa	Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This relaxes the requirements on the lower layer filesystem: now ones that implement .d_revalidate, such as NFS, can be used. Upper layer filesystems still has the "no .d_revalidate" requirement. Also a bad interaction with jffs2 locking has been fixed" * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: lookup whiteouts outside iterate_dir() ovl: allow distributed fs as lower layer ovl: don't traverse automount points	2015-07-02 11:23:00 -07:00
Linus Torvalds	052b398a43	Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "In this pile: pathname resolution rewrite. - recursion in link_path_walk() is gone. - nesting limits on symlinks are gone (the only limit remaining is that the total amount of symlinks is no more than 40, no matter how nested). - "fast" (inline) symlinks are handled without leaving rcuwalk mode. - stack footprint (independent of the nesting) is below kilobyte now, about on par with what it used to be with one level of nested symlinks and ~2.8 times lower than it used to be in the worst case. - struct nameidata is entirely private to fs/namei.c now (not even opaque pointers are being passed around). - ->follow_link() and ->put_link() calling conventions had been changed; all in-tree filesystems converted, out-of-tree should be able to follow reasonably easily. For out-of-tree conversions, see Documentation/filesystems/porting for details (and in-tree filesystems for examples of conversion). That has sat in -next since mid-May, seems to survive all testing without regressions and merges clean with v4.1" * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits) turn user_{path_at,path,lpath,path_dir}() into static inlines namei: move saved_nd pointer into struct nameidata inline user_path_create() inline user_path_parent() namei: trim do_last() arguments namei: stash dfd and name into nameidata namei: fold path_cleanup() into terminate_walk() namei: saner calling conventions for filename_parentat() namei: saner calling conventions for filename_create() namei: shift nameidata down into filename_parentat() namei: make filename_lookup() reject ERR_PTR() passed as name namei: shift nameidata inside filename_lookup() namei: move putname() call into filename_lookup() namei: pass the struct path to store the result down into path_lookupat() namei: uninline set_root{,_rcu}() namei: be careful with mountpoint crossings in follow_dotdot_rcu() Documentation: remove outdated information from automount-support.txt get rid of assorted nameidata-related debris lustre: kill unused helper lustre: kill unused macro (LOOKUP_CONTINUE) ...	2015-06-22 12:51:21 -07:00
Miklos Szeredi	cdb6727958	ovl: lookup whiteouts outside iterate_dir() If jffs2 can deadlock on overlayfs readdir because it takes the same lock on ->iterate() as in ->lookup(). Fix by moving whiteout checking outside iterate_dir(). Optimized by collecting potential whiteouts (DT_CHR) in a temporary list and if non-empty iterating throug these and checking for a 0/0 chardev. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Fixes: `49c21e1cac` ("ovl: check whiteout while reading directory") Reported-by: Roman Yeryomin <leroi.lists@gmail.com>	2015-06-22 13:53:48 +02:00
Miklos Szeredi	7c03b5d45b	ovl: allow distributed fs as lower layer Allow filesystems with .d_revalidate as lower layer(s), but not as upper layer. For local filesystems the rule was that modifications on the layers directly while being part of the overlay results in undefined behavior. This can easily be extended to distributed filesystems: we assume the tree used as lower layer is static, which means ->d_revalidate() should always return "1". If that is not the case, return -ESTALE, don't try to work around the modification. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-06-22 13:53:48 +02:00
Miklos Szeredi	a6f15d9a75	ovl: don't traverse automount points NFS and other distributed filesystems may place automount points in the tree. Previoulsy overlayfs refused to mount such filesystems types (based on the existence of the .d_automount callback), even if the actual export didn't have any automount points. It cannot be determined in advance whether the filesystem has automount points or not. The solution is to allow fs with .d_automount but refuse to traverse any automount points encountered. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-06-22 13:53:48 +02:00
David Howells	4bacc9c923	overlayfs: Make f_path always point to the overlay and f_inode to the underlay Make file->f_path always point to the overlay dentry so that the path in /proc/pid/fd is correct and to ensure that label-based LSMs have access to the overlay as well as the underlay (path-based LSMs probably don't need it). Using my union testsuite to set things up, before the patch I see: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:38 5 -> /a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 13381 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 13381 Links: 1 ... After the patch: [root@andromeda union-testsuite]# bash 5</mnt/a/foo107 [root@andromeda union-testsuite]# ls -l /proc/$$/fd/ ... lr-x------. 1 root root 64 Jun 5 14:22 5 -> /mnt/a/foo107 [root@andromeda union-testsuite]# stat /mnt/a/foo107 ... Device: 23h/35d Inode: 40346 Links: 1 ... [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5 ... Device: 23h/35d Inode: 40346 Links: 1 ... Note the change in where /proc/$$/fd/5 points to in the ls command. It was pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107 (which is correct). The inode accessed, however, is the lower layer. The union layer is on device 25h/37d and the upper layer on 24h/36d. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-19 03:19:32 -04:00
David Howells	f25801ee46	overlay: Call ovl_drop_write() earlier in ovl_dentry_open() Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open() as we've done the copy up for which we needed the freeze-write lock by that point. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-06-19 03:19:31 -04:00
Miklos Szeredi	cc6f67bcaf	ovl: mount read-only if workdir can't be created OpenWRT folks reported that overlayfs fails to mount if upper fs is full, because workdir can't be created. Wordir creation can fail for various other reasons too. There's no reason that the mount itself should fail, overlayfs can work fine without a workdir, as long as the overlay isn't modified. So mount it read-only and don't allow remounting read-write. Add a couple of WARN_ON()s for the impossible case of workdir being used despite being read-only. Reported-by: Bastian Bittorf <bittorf@bluebottle.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.18+	2015-05-19 14:30:12 +02:00
Miklos Szeredi	d377c5eb54	ovl: don't remove non-empty opaque directory When removing an opaque directory we can't just call rmdir() to check for emptiness, because the directory will need to be replaced with a whiteout. The replacement is done with RENAME_EXCHANGE, which doesn't check emptiness. Solution is just to check emptiness by reading the directory. In the future we could add a new rename flag to check for emptiness even for RENAME_EXCHANGE to optimize this case. Reported-by: Vincent Batts <vbatts@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Fixes: `263b4a0fee` ("ovl: dont replace opaque dir") Cc: <stable@vger.kernel.org> # v4.0+	2015-05-14 10:04:44 +02:00
Al Viro	5f2c4179e1	switch ->put_link() from dentry to inode only one instance looks at that argument at all; that sole exception wants inode rather than dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-11 08:13:12 -04:00
Al Viro	6e77137b36	don't pass nameidata to ->follow_link() its only use is getting passed to nd_jump_link(), which can obtain it from current->nameidata Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:20:15 -04:00
Al Viro	680baacbca	new ->follow_link() and ->put_link() calling conventions a) instead of storing the symlink body (via nd_set_link()) and returning an opaque pointer later passed to ->put_link(), ->follow_link() _stores_ that opaque pointer (into void * passed by address by caller) and returns the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic symlinks) and pointer to symlink body for normal symlinks. Stored pointer is ignored in all cases except the last one. Storing NULL for opaque pointer (or not storing it at all) means no call of ->put_link(). b) the body used to be passed to ->put_link() implicitly (via nameidata). Now only the opaque pointer is. In the cases when we used the symlink body to free stuff, ->follow_link() now should store it as opaque pointer in addition to returning it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:19:45 -04:00
NeilBrown	3188b2955d	ovl: rearrange ovl_follow_link to it doesn't need to call ->put_link ovl_follow_link current calls ->put_link on an error path. However ->put_link is about to change in a way that it will be impossible to call it from ovl_follow_link. So rearrange the code to avoid the need for that error path. Specifically: move the kmalloc() call before the ->follow_link() call to the subordinate filesystem. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-05-10 22:18:20 -04:00
hujianyang	71cbad7e69	ovl: upper fs should not be R/O After importing multi-lower layer support, users could mount a r/o partition as the left most lowerdir instead of using it as upperdir. And a r/o upperdir may cause an error like overlayfs: failed to create directory ./workdir/work during mount. This patch check the s_flags of upper fs and return an error if it is a r/o partition. The checking of upper_mnt->mnt_sb->s_flags can be removed now. This patch also remove /* FIXME: workdir is not needed for a R/O mount */ from ovl_fill_super() because: 1) for upper fs r/o case Setting a r/o partition as upper is prevented, no need to care about workdir in this case. 2) for "mount overlay -o ro" with a r/w upper fs case Users could remount overlayfs to r/w in this case, so workdir should not be omitted. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:48 +01:00
hujianyang	6be4506e34	ovl: check lowerdir amount for non-upper mount Recently multi-lower layer mount support allow upperdir and workdir to be omitted, then cause overlayfs can be mount with only one lowerdir directory. This action make no sense and have potential risk. This patch check the total number of lower directories to prevent mounting overlayfs with only one directory. Also, an error message is added to indicate lower directories exceed OVL_MAX_STACK limit. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:47 +01:00
hujianyang	bead55ef77	ovl: print error message for invalid mount options Overlayfs should print an error message if an incorrect mount option is caught like other filesystems. After this patch, improper option input could be clearly known. Reported-by: Fabian Sturm <fabian.sturm@aduu.de> Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-03-18 10:29:47 +01:00
David Howells	e36cb0b89c	VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_(dentry) Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' \|') \|\| die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") \|\| die $coccifile; print($fd "$_\n") \|\| die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 \|\| die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-02-22 11:38:41 -05:00
Al Viro	ce7b9facdf	Merge branch 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-next	2015-02-20 04:58:52 -05:00
hujianyang	4330397e4e	ovl: discard independent cursor in readdir() Since the ovl_dir_cache is stable during a directory reading, the cursor of struct ovl_dir_file don't need to be an independent entry in the list of a merged directory. This patch changes cursor to a pointer which points to the entry in the ovl_dir_cache. After this, we don't need to check is_cursor either. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-01-09 14:55:57 +01:00
Seunghun Lee	3cdf6fe910	ovl: Prevent rw remount when it should be ro mount Overlayfs should be mounted read-only when upper-fs is read-only or nonexistent. But now it can be remounted read-write and this can cause kernel panic. So we should prevent read-write remount when the above situation happens. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-01-08 14:47:21 +01:00
hujianyang	a425c037f3	ovl: Fix opaque regression in ovl_lookup Current multi-layer support overlayfs has a regression in .lookup(). If there is a directory in upperdir and a regular file has same name in lowerdir in a merged directory, lower file is hidden and upper directory is set to opaque in former case. But it is changed in present code. In lowerdir lookup path, if a found inode is not directory, the type checking of previous inode is missing. This inode will be copied to the lowerstack of ovl_entry directly. That will lead to several wrong conditions, for example, the reading of the directory in upperdir may return an error like: ls: reading directory .: Not a directory This patch makes the lowerdir lookup path check the opaque for non-directory file too. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-01-08 14:47:20 +01:00
hujianyang	2f83fd8c28	ovl: Fix kernel panic while mounting overlayfs The function ovl_fill_super() in recently multi-layer support version will incorrectly return 0 at error handling path and then cause kernel panic. This failure can be reproduced by mounting a overlayfs with upperdir and workdir in different mounts. And also, If the memory allocation of lower_mnt fail, this function may return an zero either. This patch fix this problem by setting err to proper error number before jumping to error handling path. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2015-01-08 14:47:20 +01:00
hujianyang	cead89bb08	ovl: Use macros to present ovl_xattr This patch adds two macros: OVL_XATTR_PRE_NAME and OVL_XATTR_PRE_LEN to present ovl_xattr name prefix and its length. Also, a new macro OVL_XATTR_OPAQUE is introduced to replace old ovl_opaque_xattr. Fix the length of "trusted.overlay." to 16. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:52 +01:00
hujianyang	1ba38725a3	ovl: Cleanup redundant blank lines This patch removes redundant blanks lines in overlayfs. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:52 +01:00
Miklos Szeredi	a78d9f0d5d	ovl: support multiple lower layers Allow "lowerdir=" option to contain multiple lower directories separated by a colon (e.g. "lowerdir=/bin:/usr/bin"). Colon characters in filenames can be escaped with a backslash. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:52 +01:00
Miklos Szeredi	53a08cb9b8	ovl: make upperdir optional Make "upperdir=" mount option optional. If "upperdir=" is not given, then the "workdir=" option is also optional (and ignored if given). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:51 +01:00
Miklos Szeredi	ab508822ca	ovl: improve mount helpers Move common checks into ovl_mount_dir() helper. Create helper for looking up lower directories. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:49 +01:00
Miklos Szeredi	3b7a9a249a	ovl: mount: change order of initialization Move allocation of root entry above to where it's needed. Move initializations related to upperdir and workdir near each other. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:48 +01:00
Miklos Szeredi	4ebc581828	ovl: allow statfs if no upper layer Handle "no upper layer" case in statfs. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:46 +01:00
Miklos Szeredi	09e10322b7	ovl: lookup ENAMETOOLONG on lower means ENOENT "Suppose you have in one of the lower layers a filesystem with ->lookup()-enforced upper limit on name length. Pretty much every local fs has one, but... they are not all equal. 255 characters is the common upper limit, but e.g. jffs2 stops at 254, minixfs upper limit is somewhere from 14 to 60, depending upon version, etc. You are doing a lookup for something that is present in upper layer, but happens to be too long for one of the lower layers. Too bad - ENAMETOOLONG for you..." Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:45 +01:00
Miklos Szeredi	3e01cee3b9	ovl: check whiteout on lowest layer as well Not checking whiteouts on lowest layer was an optimization (there's nothing to white out there), but it could result in inconsitent behavior when a layer previously used as upper/middle is later used as lowest. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:45 +01:00
Miklos Szeredi	3d3c6b8939	ovl: multi-layer lookup Look up dentry in all relevant layers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:44 +01:00
Miklos Szeredi	9d7459d834	ovl: multi-layer readdir If multiple lower layers exist, merge them as well in readdir according to the same rules as merging upper with lower. I.e. take whiteouts and opaque directories into account on all but the lowers layer. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:44 +01:00
Miklos Szeredi	5ef88da56a	ovl: helper to iterate layers Add helper to iterate through all the layers, starting from the upper layer (if exists) and continuing down through the lower layers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:43 +01:00
Miklos Szeredi	dd662667e6	ovl: add mutli-layer infrastructure Add multiple lower layers to 'struct ovl_fs' and 'struct ovl_entry'. ovl_entry will have an array of paths, instead of just the dentry. This allows a compact array containing just the layers which exist at current point in the tree (which is expected to be a small number for the majority of dentries). The number of layers is not limited by this infrastructure. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:43 +01:00
Miklos Szeredi	263b4a0fee	ovl: dont replace opaque dir When removing an empty opaque directory, then it makes no sense to replace it with an exact replica of itself before removal. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:43 +01:00
Miklos Szeredi	1afaba1ecb	ovl: make path-type a bitmap OVL_PATH_PURE_UPPER -> __OVL_PATH_UPPER \| __OVL_PATH_PURE OVL_PATH_UPPER -> __OVL_PATH_UPPER OVL_PATH_MERGE -> __OVL_PATH_UPPER \| __OVL_PATH_MERGE OVL_PATH_LOWER -> 0 Multiple R/O layers will allow __OVL_PATH_MERGE without __OVL_PATH_UPPER. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:42 +01:00
Miklos Szeredi	49c21e1cac	ovl: check whiteout while reading directory Don't make a separate pass for checking whiteouts, since we can do it while reading the upper directory. This will make it easier to handle multiple layers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-12-13 00:59:42 +01:00
Al Viro	ba00410b81	Merge branch 'iov_iter' into for-next	2014-12-08 20:39:29 -05:00
Miklos Szeredi	7676895f47	ovl: ovl_dir_fsync() cleanup Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE. For a copied up directory the two are currently equivalent. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:40:02 +01:00
Miklos Szeredi	c9f00fdb9a	ovl: pass dentry into ovl_dir_read_merged() Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath. This cleans up callers and paves the way for multi-layer directory reads. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:40:01 +01:00
Miklos Szeredi	71d509280f	ovl: use lockless_dereference() for upperdentry Don't open code lockless_dereference() in ovl_upperdentry_dereference(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:40:01 +01:00
Miklos Szeredi	91c7794713	ovl: allow filenames with comma Allow option separator (comma) to be escaped with backslash. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:40:00 +01:00
Miklos Szeredi	521484639e	ovl: fix race in private xattr checks Xattr operations can race with copy up. This does not matter as long as we consistently fiter out "trunsted.overlay.opaque" attribute on upper directories. Previously we checked parent against OVL_PATH_MERGE. This is too general, and prone to race with copy-up. I.e. we found the parent to be on the lower layer but ovl_dentry_real() would return the copied-up dentry, possibly with the "opaque" attribute. So instead use ovl_path_real() and decide to filter the attributes based on the actual type of the dentry we'll use. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:40:00 +01:00
Miklos Szeredi	a105d685a8	ovl: fix remove/copy-up race ovl_remove_and_whiteout() needs to check if upper dentry exists or not after having locked upper parent directory. Previously we used a "type" value computed before locking the upper parent directory, which is susceptible to racing with copy-up. There's a similar check in ovl_check_empty_and_clear(). This one is not actually racy, since copy-up doesn't change the "emptyness" property of a directory. Add a comment to this effect, and check the existence of upper dentry locally to make the code cleaner. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-11-20 16:39:59 +01:00
Miklos Szeredi	ef94b1864d	ovl: rename filesystem type to "overlay" Some distributions carry an "old" format of overlayfs while mainline has a "new" format. The distros will possibly want to keep the old overlayfs alongside the new for compatibility reasons. To make it possible to differentiate the two versions change the name of the new one from "overlayfs" to "overlay". Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Andy Whitcroft <apw@canonical.com>	2014-11-20 16:39:59 +01:00
Miklos Szeredi	3f822c6264	ovl: don't poison cursor ovl_cache_put() can be called from ovl_dir_reset() if the cache needs to be rebuilt. We did list_del() on the cursor, which results in an Oops on the poisoned pointer in ovl_seek_cursor(). Reported-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Jordi Pujol Palomer <jordipujolp@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-11-05 08:49:38 -05:00
Miklos Szeredi	ac7576f4b1	vfs: make first argument of dir_context.actor typed Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-31 17:48:54 -04:00
Miklos Szeredi	9f2f7d4c8d	ovl: initialize ->is_cursor Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-31 17:47:51 -04:00
Miklos Szeredi	d1b72cc6d8	overlayfs: fix lockdep misannotation In an overlay directory that shadows an empty lower directory, say /mnt/a/empty102, do: touch /mnt/a/empty102/x unlink /mnt/a/empty102/x rmdir /mnt/a/empty102 It's actually harmless, but needs another level of nesting between I_MUTEX_CHILD and I_MUTEX_NORMAL. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-28 18:32:47 -04:00
Miklos Szeredi	c2096537d4	ovl: fix check for cursor ovl_cache_entry.name is now an array not a pointer, so it makes no sense test for it being NULL. Detected by coverity. From: Miklos Szeredi <mszeredi@suse.cz> Fixes: `68bf861107` ("overlayfs: make ovl_cache_entry->name an array instead of +pointer") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-28 18:31:54 -04:00
Al Viro	d45f00ae43	overlayfs: barriers for opening upper-layer directory make sure that a) all stores done by opening struct file don't leak past storing the reference in od->upperfile b) the lockless side has read dependency barrier Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-28 18:27:28 -04:00
Al Viro	db6ec212b5	overlayfs: embed middle into overlay_readdir_data same story... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-24 20:25:23 -04:00
Al Viro	49be4fb9cc	overlayfs: embed root into overlay_readdir_data no sense having it a pointer - all instances have it pointing to local variable in the same stack frame Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-24 20:25:23 -04:00
Al Viro	68bf861107	overlayfs: make ovl_cache_entry->name an array instead of pointer Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-24 20:25:22 -04:00
Al Viro	3d268c9b13	overlayfs: don't hold ->i_mutex over opening the real directory just use it to serialize the assignment Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2014-10-24 20:24:11 -04:00
Miklos Szeredi	69c433ed2e	fs: limit filesystem stacking depth Add a simple read-only counter to super_block that indicates how deep this is in the stack of filesystems. Previously ecryptfs was the only stackable filesystem and it explicitly disallowed multiple layers of itself. Overlayfs, however, can be stacked recursively and also may be stacked on top of ecryptfs or vice versa. To limit the kernel stack usage we must limit the depth of the filesystem stack. Initially the limit is set to 2. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-10-24 00:14:39 +02:00
Erez Zadok	f45827e841	overlayfs: implement show_options This is useful because of the stacking nature of overlayfs. Users like to find out (via /proc/mounts) which lower/upper directory were used at mount time. AV: even failing ovl_parse_opt() could've done some kstrdup() AV: failure of ovl_alloc_entry() should end up with ENOMEM, not EINVAL Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-10-24 00:14:38 +02:00
Andy Whitcroft	cc2596392a	overlayfs: add statfs support Add support for statfs to the overlayfs filesystem. As the upper layer is the target of all write operations assume that the space in that filesystem is the space in the overlayfs. There will be some inaccuracy as overwriting a file will copy it up and consume space we were not expecting, but it is better than nothing. Use the upper layer dentry and mount from the overlayfs root inode, passing the statfs call to that filesystem. Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-10-24 00:14:38 +02:00
Miklos Szeredi	e9be9d5e76	overlay filesystem Overlayfs allows one, usually read-write, directory tree to be overlaid onto another, read-only directory tree. All modifications go to the upper, writable layer. This type of mechanism is most often used for live CDs but there's a wide variety of other uses. The implementation differs from other "union filesystem" implementations in that after a file is opened all operations go directly to the underlying, lower or upper, filesystems. This simplifies the implementation and allows native performance in these cases. The dentry tree is duplicated from the underlying filesystems, this enables fast cached lookups without adding special support into the VFS. This uses slightly more memory than union mounts, but dentries are relatively small. Currently inodes are duplicated as well, but it is a possible optimization to share inodes for non-directories. Opening non directories results in the open forwarded to the underlying filesystem. This makes the behavior very similar to union mounts (with the same limitations vs. fchmod/fchown on O_RDONLY file descriptors). Usage: mount -t overlayfs overlayfs -olowerdir=/lower,upperdir=/upper/upper,workdir=/upper/work /overlay The following cotributions have been folded into this patch: Neil Brown <neilb@suse.de>: - minimal remount support - use correct seek function for directories - initialise is_real before use - rename ovl_fill_cache to ovl_dir_read Felix Fietkau <nbd@openwrt.org>: - fix a deadlock in ovl_dir_read_merged - fix a deadlock in ovl_remove_whiteouts Erez Zadok <ezk@fsl.cs.sunysb.edu> - fix cleanup after WARN_ON Sedat Dilek <sedat.dilek@googlemail.com> - fix up permission to confirm to new API Robin Dong <hao.bigrat@gmail.com> - fix possible leak in ovl_new_inode - create new inode in ovl_link Andy Whitcroft <apw@canonical.com> - switch to __inode_permission() - copy up i_uid/i_gid from the underlying inode AV: - ovl_copy_up_locked() - dput(ERR_PTR(...)) on two failure exits - ovl_clear_empty() - one failure exit forgetting to do unlock_rename(), lack of check for udir being the parent of upper, dropping and regaining the lock on udir (which would require _another_ check for parent being right). - bogus d_drop() in copyup and rename [fix from your mail] - copyup/remove and copyup/rename races [fix from your mail] - ovl_dir_fsync() leaving ERR_PTR() in ->realfile - ovl_entry_free() is pointless - it's just a kfree_rcu() - fold ovl_do_lookup() into ovl_lookup() - manually assigning ->d_op is wrong. Just use ->s_d_op. [patches picked from Miklos]: * copyup/remove and copyup/rename races * bogus d_drop() in copyup and rename Also thanks to the following people for testing and reporting bugs: Jordi Pujol <jordipujolp@gmail.com> Andy Whitcroft <apw@canonical.com> Michal Suchanek <hramrach@centrum.cz> Felix Fietkau <nbd@openwrt.org> Erez Zadok <ezk@fsl.cs.sunysb.edu> Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>	2014-10-24 00:14:38 +02:00

1 2 3 4 5

229 Commits