Go to file
Yang Shi 89fdcd262f mm: shmem: make stat.st_blksize return huge page size if THP is on
Since tmpfs THP was supported in 4.8, hugetlbfs is not the only
filesystem with huge page support anymore.  tmpfs can use huge page via
THP when mounting by "huge=" mount option.

When applications use huge page on hugetlbfs, it just need check the
filesystem magic number, but it is not enough for tmpfs.  Make
stat.st_blksize return huge page size if it is mounted by appropriate
"huge=" option to give applications a hint to optimize the behavior with
THP.

Some applications may not do wisely with THP.  For example, QEMU may
mmap file on non huge page aligned hint address with MAP_FIXED, which
results in no pages are PMD mapped even though THP is used.  Some
applications may mmap file with non huge page aligned offset.  Both
behaviors make THP pointless.

statfs.f_bsize still returns 4KB for tmpfs since THP could be split, and
it also may fallback to 4KB page silently if there is not enough huge
page.  Furthermore, different f_bsize makes max_blocks and free_blocks
calculation harder but without too much benefit.  Returning huge page
size via stat.st_blksize sounds good enough.

Since PUD size huge page for THP has not been supported, now it just
returns HPAGE_PMD_SIZE.

Hugh said:

: Sorry, I have no enthusiasm for this patch; but do I feel strongly
: enough to override you and everyone else to NAK it?  No, I don't feel
: that strongly, maybe st_blksize isn't worth arguing over.
:
: We did look at struct stat when designing huge tmpfs, to see if there
: were any fields that should be adjusted for it; but concluded none.
: Yes, it would sometimes be nice to have a quickly accessible indicator
: for when tmpfs has been mounted huge (scanning /proc/mounts for options
: can be tiresome, agreed); but since tmpfs tries to supply huge (or not)
: pages transparently, no difference seemed right.
:
: So, because st_blksize is a not very useful field of struct stat, with
: "size" in the name, we're going to put HPAGE_PMD_SIZE in there instead
: of PAGE_SIZE, if the tmpfs was mounted with one of the huge "huge"
: options (force or always, okay; within_size or advise, not so much).
: Though HPAGE_PMD_SIZE is no more its "preferred I/O size" or "blocksize
: for file system I/O" than PAGE_SIZE was.
:
: Which we can expect to speed up some applications and disadvantage
: others, depending on how they interpret st_blksize: just like if we
: changed it in the same way on non-huge tmpfs.  (Did I actually try
: changing st_blksize early on, and find it broke something?  If so, I've
: now forgotten what, and a search through commit messages didn't find
: it; but I guess we'll find out soon enough.)
:
: If there were an mstat() syscall, returning a field "preferred
: alignment", then we could certainly agree to put HPAGE_PMD_SIZE in
: there; but in stat()'s st_blksize?  And what happens when (in future)
: mm maps this or that hard-disk filesystem's blocks with a pmd mapping -
: should that filesystem then advertise a bigger st_blksize, despite the
: same disk layout as before?  What happens with DAX?
:
: And this change is not going to help the QEMU suboptimality that
: brought you here (or does QEMU align mmaps according to st_blksize?).
: QEMU ought to work well with kernels without this change, and kernels
: with this change; and I hope it can easily deal with both by avoiding
: that use of MAP_FIXED which prevented the kernel's intended alignment.

[akpm@linux-foundation.org: remove unneeded `else']
Link: http://lkml.kernel.org/r/1524665633-83806-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-07 17:34:35 -07:00
arch mm: introduce ARCH_HAS_PTE_SPECIAL 2018-06-07 17:34:35 -07:00
block Changes for 4.18: 2018-06-05 13:24:20 -07:00
certs
crypto - Introduce arithmetic overflow test helper functions (Rasmus) 2018-06-06 17:27:14 -07:00
Documentation mm/docs: describe memory.low refinements 2018-06-07 17:34:35 -07:00
drivers zram: introduce zram memory tracking 2018-06-07 17:34:34 -07:00
firmware
fs mm: restructure memfd code 2018-06-07 17:34:35 -07:00
include mm: memory.low hierarchical behavior 2018-06-07 17:34:35 -07:00
init Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
ipc Merge branch 'timers-2038-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 21:02:18 -07:00
kernel mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct 2018-06-07 17:34:34 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
mm mm: shmem: make stat.st_blksize return huge page size if THP is on 2018-06-07 17:34:35 -07:00
net net/9p/trans_xen.c: don't inclide rwlock.h directly 2018-06-07 17:34:34 -07:00
samples samples/bpf: xdpsock: use skb Tx path for XDP_SKB 2018-06-05 15:48:57 +02:00
scripts scripts: use SPDX tag in get_maintainer and checkpatch 2018-06-07 17:34:33 -07:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
sound media updates for v4.18-rc1 2018-06-07 12:34:37 -07:00
tools powerpc updates for 4.18 2018-06-07 10:23:33 -07:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-06-04 15:23:48 -07:00
.clang-format clang-format: add configuration file 2018-04-11 10:28:35 -07:00
.cocciconfig
.get_maintainer.ignore
.gitattributes
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap Merge branch 'asoc-4.17' into asoc-4.18 for compress dependencies 2018-04-26 12:24:28 +01:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS
Kbuild
Kconfig kconfig: add basic helper macros to scripts/Kconfig.include 2018-05-29 03:31:19 +09:00
MAINTAINERS media updates for v4.18-rc1 2018-06-07 12:34:37 -07:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-06-06 18:39:49 -07:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.