Remove unnecessary check at the end of codegen() routine which makes codegen()
to always fail and exit bpftool with error code. Positive value of variable
n is not an indicator of a failure.
Fixes: 2c4779eff8 ("tools, bpftool: Exit on error in function codegen")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/bpf/20200612201603.680852-1-andriin@fb.com
Add missed bpf_map_charge_init() in sock_hash_alloc() and
correspondingly bpf_map_charge_finish() on ENOMEM.
It was found accidentally while working on unrelated selftest that
checks "map->memory.pages > 0" is true for all map types.
Before:
# bpftool m l
...
3692: sockhash name m_sockhash flags 0x0
key 4B value 4B max_entries 8 memlock 0B
After:
# bpftool m l
...
84: sockmap name m_sockmap flags 0x0
key 4B value 4B max_entries 8 memlock 4096B
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200612000857.2881453-1-rdna@fb.com
The stream parser infrastructure isn't set up to deal with UDP
sockets, so we mustn't try to attach programs to them.
I remember making this change at some point, but I must have lost
it while rebasing or something similar.
Fixes: 7b98cd42b0 ("bpf: sockmap: Add UDP support")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200611172520.327602-1-lmb@cloudflare.com
If the peer is closed, we will never get more data, so
tcp_bpf_wait_data will get stuck forever. In case we passed
MSG_DONTWAIT to recv(), we get EAGAIN but we should actually get
0.
>From man 2 recv:
RETURN VALUE
When a stream socket peer has performed an orderly shutdown, the
return value will be 0 (the traditional "end-of-file" return).
This patch makes tcp_bpf_wait_data always return 1 when the peer
socket has been shutdown. Either we have data available, and it would
have returned 1 anyway, or there isn't, in which case we'll call
tcp_recvmsg which does the right thing in this situation.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/26038a28c21fea5d04d4bd4744c5686d3f2e5504.1591784177.git.sd@queasysnail.net
Currently, the codegen function might fail and return an error. But its
callers continue without checking its return value. Since codegen can
fail only in the unlikely case of the system running out of memory or
the static template being malformed, just exit(-1) directly from codegen
and make it void-returning.
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200611103341.21532-1-tklauser@distanz.ch
Propagate sock_alloc_send_skb error code, not set it to
EAGAIN unconditionally, when fail to allocate skb, which
might cause that user space unnecessary loops.
Fixes: 35fcde7f8d ("xsk: support for Tx")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1591852266-24017-1-git-send-email-lirongqing@baidu.com
Free the memory allocated for the template on error paths in function
codegen.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200610130804.21423-1-tklauser@distanz.ch
When cgroup_skb/egress triggers the MAC header is not set. Added a
test that asserts reading MAC header is a -EFAULT but NET header
succeeds. The test result from within the eBPF program is stored in
an 1-element array map that the userspace then reads and asserts on.
Another assertion is added that reading from a large offset, past
the end of packet, returns -EFAULT.
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/9028ccbea4385a620e69c0a104f469ffd655c01e.1591812755.git.zhuyifei@google.com
Added a check in the switch case on start_header that checks for
the existence of the header, and in the case that MAC is not set
and the caller requests for MAC, -EFAULT. If the caller requests
for NET then MAC's existence is completely ignored.
There is no function to check NET header's existence and as far
as cgroup_skb/egress is concerned it should always be set.
Removed for ptr >= the start of header, considering offset is
bounded unsigned and should always be true. len <= end - mac is
redundant to ptr + len <= end.
Fixes: 3eee1f75f2 ("bpf: fix bpf_skb_load_bytes_relative pkt length check")
Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/76bb820ddb6a95f59a772ecbd8c8a336f646b362.1591812755.git.zhuyifei@google.com
This allows transparent cross-compilation with CROSS_COMPILE by
relying on 7ed1c1901f ("tools: fix cross-compile var clobbering").
Same change was applied to tools/bpf/bpftool/Makefile in
9e88b9312a ("tools: bpftool: do not force gcc as CC").
Signed-off-by: Brett Mastbergen <brett.mastbergen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200609213506.3299-1-brett.mastbergen@gmail.com
Handle a GCC quirk of emitting extra volatile modifier in DWARF (and
subsequently preserved in BTF by pahole) for function pointers marked as
__attribute__((noreturn)). This was the way to mark such functions before GCC
2.5 added noreturn attribute. Drop such func_proto modifiers, similarly to how
it's done for array (also to handle GCC quirk/bug).
Such volatile attribute is emitted by GCC only, so existing selftests can't
express such test. Simple repro is like this (compiled with GCC + BTF
generated by pahole):
struct my_struct {
void __attribute__((noreturn)) (*fn)(int);
};
struct my_struct a;
Without this fix, output will be:
struct my_struct {
voidvolatile (*fn)(int);
};
With the fix:
struct my_struct {
void (*fn)(int);
};
Fixes: 351131b51c ("libbpf: add btf_dump API for BTF-to-C conversion")
Reported-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/bpf/20200610052335.2862559-1-andriin@fb.com
V2:
- Defer changing BPF-syscall to start at file-descriptor 1
- Use {} to zero initialise struct.
The recent commit fbee97feed ("bpf: Add support to attach bpf program to a
devmap entry"), introduced ability to attach (and run) a separate XDP
bpf_prog for each devmap entry. A bpf_prog is added via a file-descriptor.
As zero were a valid FD, not using the feature requires using value minus-1.
The UAPI is extended via tail-extending struct bpf_devmap_val and using
map->value_size to determine the feature set.
This will break older userspace applications not using the bpf_prog feature.
Consider an old userspace app that is compiled against newer kernel
uapi/bpf.h, it will not know that it need to initialise the member
bpf_prog.fd to minus-1. Thus, users will be forced to update source code to
get program running on newer kernels.
This patch remove the minus-1 checks, and have zero mean feature isn't used.
Followup patches either for kernel or libbpf should handle and avoid
returning file-descriptor zero in the first place.
Fixes: fbee97feed ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/159170950687.2102545.7235914718298050113.stgit@firesoul
When using BPF_PROG_ATTACH to attach a program to a cgroup in
BPF_F_ALLOW_MULTI mode, it is not possible to replace a program
with itself. This is because the check for duplicate programs
doesn't take the replacement program into account.
Replacing a program with itself might seem weird, but it has
some uses: first, it allows resetting the associated cgroup storage.
Second, it makes the API consistent with the non-ALLOW_MULTI usage,
where it is possible to replace a program with itself. Third, it
aligns BPF_PROG_ATTACH with bpf_link, where replacing itself is
also supported.
Sice this code has been refactored a few times this change will
only apply to v5.7 and later. Adjustments could be made to
commit 1020c1f24a ("bpf: Simplify __cgroup_bpf_attach") and
commit d7bf2c10af ("bpf: allocate cgroup storage entries on attaching bpf programs")
as well as commit 324bda9e6c ("bpf: multi program support for cgroup+bpf")
Fixes: af6eea5743 ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200608162202.94002-1-lmb@cloudflare.com
This is a new context that does not handle metadata at the moment, so
mark data_meta invalid.
Fixes: fbee97feed ("bpf: Add support to attach bpf program to a devmap entry")
Signed-off-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200608151723.9539-1-dsahern@kernel.org
Commit 60d53e2c3b ("tracing/probe: Split trace_event related data from
trace_probe") removed the trace_[ku]probe structure from the
trace_event_call->data pointer. As bpf_get_[ku]probe_info() were
forgotten in that change, fix them now. These functions are currently
only used by the bpf_task_fd_query() syscall handler to collect
information about a perf event.
Fixes: 60d53e2c3b ("tracing/probe: Split trace_event related data from trace_probe")
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20200608124531.819838-1-jean-philippe@linaro.org
bpf_iter requires the kernel BTF to be generated with
pahole >= 1.16, since otherwise the function definitions
that the iterator attaches to are not included.
This failure mode is indistiguishable from trying to attach
to an iterator that really doesn't exist.
Since it's really easy to miss this requirement, bump the
pahole version check used at build time to at least 1.16.
Fixes: 15d83c4d7c ("bpf: Allow loading of a bpf_iter program")
Suggested-by: Ivan Babrou <ivan@cloudflare.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200608094257.47366-1-lmb@cloudflare.com
We can end up modifying the sockhash bucket list from two CPUs when a
sockhash is being destroyed (sock_hash_free) on one CPU, while a socket
that is in the sockhash is unlinking itself from it on another CPU
it (sock_hash_delete_from_link).
This results in accessing a list element that is in an undefined state as
reported by KASAN:
| ==================================================================
| BUG: KASAN: wild-memory-access in sock_hash_free+0x13c/0x280
| Write of size 8 at addr dead000000000122 by task kworker/2:1/95
|
| CPU: 2 PID: 95 Comm: kworker/2:1 Not tainted 5.7.0-rc7-02961-ge22c35ab0038-dirty #691
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
| Workqueue: events bpf_map_free_deferred
| Call Trace:
| dump_stack+0x97/0xe0
| ? sock_hash_free+0x13c/0x280
| __kasan_report.cold+0x5/0x40
| ? mark_lock+0xbc1/0xc00
| ? sock_hash_free+0x13c/0x280
| kasan_report+0x38/0x50
| ? sock_hash_free+0x152/0x280
| sock_hash_free+0x13c/0x280
| bpf_map_free_deferred+0xb2/0xd0
| ? bpf_map_charge_finish+0x50/0x50
| ? rcu_read_lock_sched_held+0x81/0xb0
| ? rcu_read_lock_bh_held+0x90/0x90
| process_one_work+0x59a/0xac0
| ? lock_release+0x3b0/0x3b0
| ? pwq_dec_nr_in_flight+0x110/0x110
| ? rwlock_bug.part.0+0x60/0x60
| worker_thread+0x7a/0x680
| ? _raw_spin_unlock_irqrestore+0x4c/0x60
| kthread+0x1cc/0x220
| ? process_one_work+0xac0/0xac0
| ? kthread_create_on_node+0xa0/0xa0
| ret_from_fork+0x24/0x30
| ==================================================================
Fix it by reintroducing spin-lock protected critical section around the
code that removes the elements from the bucket on sockhash free.
To do that we also need to defer processing of removed elements, until out
of atomic context so that we can unlink the socket from the map when
holding the sock lock.
Fixes: 90db6d772f ("bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-3-jakub@cloudflare.com
When sockhash gets destroyed while sockets are still linked to it, we will
walk the bucket lists and delete the links. However, we are not freeing the
list elements after processing them, leaking the memory.
The leak can be triggered by close()'ing a sockhash map when it still
contains sockets, and observed with kmemleak:
unreferenced object 0xffff888116e86f00 (size 64):
comm "race_sock_unlin", pid 223, jiffies 4294731063 (age 217.404s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
81 de e8 41 00 00 00 00 c0 69 2f 15 81 88 ff ff ...A.....i/.....
backtrace:
[<00000000dd089ebb>] sock_hash_update_common+0x4ca/0x760
[<00000000b8219bd5>] sock_hash_update_elem+0x1d2/0x200
[<000000005e2c23de>] __do_sys_bpf+0x2046/0x2990
[<00000000d0084618>] do_syscall_64+0xad/0x9a0
[<000000000d96f263>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
Fix it by freeing the list element when we're done with it.
Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200607205229.2389672-2-jakub@cloudflare.com
When user application calls read() with MSG_PEEK flag to read data
of bpf sockmap socket, kernel panic happens at
__tcp_bpf_recvmsg+0x12c/0x350. sk_msg is not removed from ingress_msg
queue after read out under MSG_PEEK flag is set. Because it's not
judged whether sk_msg is the last msg of ingress_msg queue, the next
sk_msg may be the head of ingress_msg queue, whose memory address of
sg page is invalid. So it's necessary to add check codes to prevent
this problem.
[20759.125457] BUG: kernel NULL pointer dereference, address:
0000000000000008
[20759.132118] CPU: 53 PID: 51378 Comm: envoy Tainted: G E
5.4.32 #1
[20759.140890] Hardware name: Inspur SA5212M4/YZMB-00370-109, BIOS
4.1.12 06/18/2017
[20759.149734] RIP: 0010:copy_page_to_iter+0xad/0x300
[20759.270877] __tcp_bpf_recvmsg+0x12c/0x350
[20759.276099] tcp_bpf_recvmsg+0x113/0x370
[20759.281137] inet_recvmsg+0x55/0xc0
[20759.285734] __sys_recvfrom+0xc8/0x130
[20759.290566] ? __audit_syscall_entry+0x103/0x130
[20759.296227] ? syscall_trace_enter+0x1d2/0x2d0
[20759.301700] ? __audit_syscall_exit+0x1e4/0x290
[20759.307235] __x64_sys_recvfrom+0x24/0x30
[20759.312226] do_syscall_64+0x55/0x1b0
[20759.316852] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: dihu <anny.hu@linux.alibaba.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200605084625.9783-1-anny.hu@linux.alibaba.com
Fix test race, in which background poll can get either 5 or 6 samples,
depending on timing of notification. Prevent this by open-coding sample
triggering and forcing notification for the very last sample only.
Also switch to using atomic increments and exchanges for more obviously
reliable counting and checking. Additionally, check expected processed sample
counters for single-threaded use cases as well.
Fixes: 9a5f25ad30 ("selftests/bpf: Fix sample_cnt shared between two threads")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200608003615.3549991-1-andriin@fb.com
This code returns success if the "info_aux" allocation fails but it
should return -ENOMEM.
Fixes: 8c1b6e69dc ("bpf: Compare BTF types of functions arguments with actual types")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200604085436.GA943001@mwanda
A recent commit added new variables only used if CONFIG_NETDEVICES is
set. A simple fix would be to only declare these variables if the same
condition is valid but Alexei suggested an even simpler solution:
since CONFIG_NETDEVICES doesn't change anything in .h I think the
best is to remove #ifdef CONFIG_NETDEVICES from net/core/filter.c
and rely on sock_bindtoindex() returning ENOPROTOOPT in the extreme
case of oddly configured kernels.
Fixes: 70c58997c1 ("bpf: Allow SO_BINDTODEVICE opt in bpf_setsockopt")
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200603190347.2310320-1-matthieu.baerts@tessares.net
Pull networking updates from David Miller:
1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
Augusto von Dentz.
2) Add GSO partial support to igc, from Sasha Neftin.
3) Several cleanups and improvements to r8169 from Heiner Kallweit.
4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
device self-test. From Andrew Lunn.
5) Start moving away from custom driver versions, use the globally
defined kernel version instead, from Leon Romanovsky.
6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.
7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.
8) Add sriov and vf support to hinic, from Luo bin.
9) Support Media Redundancy Protocol (MRP) in the bridging code, from
Horatiu Vultur.
10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.
11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
Dubroca. Also add ipv6 support for espintcp.
12) Lots of ReST conversions of the networking documentation, from Mauro
Carvalho Chehab.
13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
from Doug Berger.
14) Allow to dump cgroup id and filter by it in inet_diag code, from
Dmitry Yakunin.
15) Add infrastructure to export netlink attribute policies to
userspace, from Johannes Berg.
16) Several optimizations to sch_fq scheduler, from Eric Dumazet.
17) Fallback to the default qdisc if qdisc init fails because otherwise
a packet scheduler init failure will make a device inoperative. From
Jesper Dangaard Brouer.
18) Several RISCV bpf jit optimizations, from Luke Nelson.
19) Correct the return type of the ->ndo_start_xmit() method in several
drivers, it's netdev_tx_t but many drivers were using
'int'. From Yunjian Wang.
20) Add an ethtool interface for PHY master/slave config, from Oleksij
Rempel.
21) Add BPF iterators, from Yonghang Song.
22) Add cable test infrastructure, including ethool interfaces, from
Andrew Lunn. Marvell PHY driver is the first to support this
facility.
23) Remove zero-length arrays all over, from Gustavo A. R. Silva.
24) Calculate and maintain an explicit frame size in XDP, from Jesper
Dangaard Brouer.
25) Add CAP_BPF, from Alexei Starovoitov.
26) Support terse dumps in the packet scheduler, from Vlad Buslov.
27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.
28) Add devm_register_netdev(), from Bartosz Golaszewski.
29) Minimize qdisc resets, from Cong Wang.
30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
eliminate set_fs/get_fs calls. From Christoph Hellwig.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
selftests: net: ip_defrag: ignore EPERM
net_failover: fixed rollback in net_failover_open()
Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
vmxnet3: allow rx flow hash ops only when rss is enabled
hinic: add set_channels ethtool_ops support
selftests/bpf: Add a default $(CXX) value
tools/bpf: Don't use $(COMPILE.c)
bpf, selftests: Use bpf_probe_read_kernel
s390/bpf: Use bcr 0,%0 as tail call nop filler
s390/bpf: Maintain 8-byte stack alignment
selftests/bpf: Fix verifier test
selftests/bpf: Fix sample_cnt shared between two threads
bpf, selftests: Adapt cls_redirect to call csum_level helper
bpf: Add csum_level helper for fixing up csum levels
bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
crypto/chtls: IPv6 support for inline TLS
Crypto/chcr: Fixes a coccinile check error
Crypto/chcr: Fixes compilations warnings
...
Pull comedi uaccess cleanups from Al Viro:
"Comedi compat ioctls done saner - killing the single biggest pile of
__get_user/__put_user outside of arch/* in the process"
* 'uaccess.comedi' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
comedi: get rid of compat_alloc_user_space() mess in COMEDI_CMD{,TEST} compat
comedi: do_cmd_ioctl(): lift copyin/copyout into the caller
comedi: do_cmdtest_ioctl(): lift copyin/copyout into the caller
comedi: lift copy_from_user() into callers of __comedi_get_user_cmd()
comedi: get rid of compat_alloc_user_space() mess in COMEDI_INSNLIST compat
comedi: get rid of compat_alloc_user_space() mess in COMEDI_INSN compat
comedi: get rid of compat_alloc_user_space() mess in COMEDI_RANGEINFO compat
comedi: get rid of compat_alloc_user_space() mess in COMEDI_CHANINFO compat
comedi: get rid of indirection via translated_ioctl()
comedi: move compat ioctl handling to native fops
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested virtualization
- Nested AMD event injection facelift, building on the rework of generic code
and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page fault
work, will come next week.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp
ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1
5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4
7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ
asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy
CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg==
=v7Wn
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Move the arch-specific code into arch/arm64/kvm
- Start the post-32bit cleanup
- Cherry-pick a few non-invasive pre-NV patches
x86:
- Rework of TLB flushing
- Rework of event injection, especially with respect to nested
virtualization
- Nested AMD event injection facelift, building on the rework of
generic code and fixing a lot of corner cases
- Nested AMD live migration support
- Optimization for TSC deadline MSR writes and IPIs
- Various cleanups
- Asynchronous page fault cleanups (from tglx, common topic branch
with tip tree)
- Interrupt-based delivery of asynchronous "page ready" events (host
side)
- Hyper-V MSRs and hypercalls for guest debugging
- VMX preemption timer fixes
s390:
- Cleanups
Generic:
- switch vCPU thread wakeup from swait to rcuwait
The other architectures, and the guest side of the asynchronous page
fault work, will come next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits)
KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test
KVM: check userspace_addr for all memslots
KVM: selftests: update hyperv_cpuid with SynDBG tests
x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls
x86/kvm/hyper-v: enable hypercalls regardless of hypercall page
x86/kvm/hyper-v: Add support for synthetic debugger interface
x86/hyper-v: Add synthetic debugger definitions
KVM: selftests: VMX preemption timer migration test
KVM: nVMX: Fix VMX preemption timer migration
x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit
KVM: x86/pmu: Support full width counting
KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in
KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT
KVM: x86: acknowledgment mechanism for async pf page ready notifications
KVM: x86: interrupt based APF 'page ready' event delivery
KVM: introduce kvm_read_guest_offset_cached()
KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present()
KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info
Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously"
KVM: VMX: Replace zero-length array with flexible-array
...
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAl7WhbkTHHdlaS5saXVA
a2VybmVsLm9yZwAKCRB2FHBfkEGgXlUnB/0R8dBVSeRfNmyJaadBWKFc/LffwKLD
CQ8PVv22ffkCaEYV2tpnhS6NmkERLNdson4Uo02tVUsjOJ4CrWHTn7aKqYWZyA+O
qv/PiD9TBXJVYMVP2kkyaJlK5KoqeAWBr2kM16tT0cxQmlhE7g0Xo2wU9vhRbU+4
i4F0jffe4lWps65TK392CsPr6UEv1HSel191Py5zLzYqChT+L8WfahmBt3chhsV5
TIUJYQvBwxecFRla7yo+4sUn37ZfcTqD1hCWSr0zs4psW0ge7d80kuaNZS+EqxND
fGm3Bp1BlUuDKsJ/D+AaHLCR47PUZ9t9iMDjZS/ovYglLFwi+h3tAV+W
=LwVR
-----END PGP SIGNATURE-----
Merge tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyper-v updates from Wei Liu:
- a series from Andrea to support channel reassignment
- a series from Vitaly to clean up Vmbus message handling
- a series from Michael to clean up and augment hyperv-tlfs.h
- patches from Andy to clean up GUID usage in Hyper-V code
- a few other misc patches
* tag 'hyperv-next-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (29 commits)
Drivers: hv: vmbus: Resolve more races involving init_vp_index()
Drivers: hv: vmbus: Resolve race between init_vp_index() and CPU hotplug
vmbus: Replace zero-length array with flexible-array
Driver: hv: vmbus: drop a no long applicable comment
hyper-v: Switch to use UUID types directly
hyper-v: Replace open-coded variant of %*phN specifier
hyper-v: Supply GUID pointer to printf() like functions
hyper-v: Use UUID API for exporting the GUID (part 2)
asm-generic/hyperv: Add definitions for Get/SetVpRegister hypercalls
x86/hyperv: Split hyperv-tlfs.h into arch dependent and independent files
x86/hyperv: Remove HV_PROCESSOR_POWER_STATE #defines
KVM: x86: hyperv: Remove duplicate definitions of Reference TSC Page
drivers: hv: remove redundant assignment to pointer primary_channel
scsi: storvsc: Re-init stor_chns when a channel interrupt is re-assigned
Drivers: hv: vmbus: Introduce the CHANNELMSG_MODIFYCHANNEL message type
Drivers: hv: vmbus: Synchronize init_vp_index() vs. CPU hotplug
Drivers: hv: vmbus: Remove the unused HV_LOCALIZED channel affinity logic
PCI: hv: Prepare hv_compose_msi_msg() for the VMBus-channel-interrupt-to-vCPU reassignment functionality
Drivers: hv: vmbus: Use a spin lock for synchronizing channel scheduling vs. channel removal
hv_utils: Always execute the fcopy and vss callbacks in a tasklet
...
By far the biggest change in this cycle are the changes that allow much
earlier debug of systems that are hooked up via UART by taking advantage
of the earlycon framework to implement the kgdb I/O hooks before handing
over to the regular polling I/O drivers once they are available. When
discussing Doug's work we also found and fixed an broken
raw_smp_processor_id() sequence in in_dbg_master().
Also included are a collection of much smaller fixes and tweaks: a
couple of tweaks to ged rid of doc gen or coccicheck warnings, future
proof some internal calculations that made implicit power-of-2
assumptions and eliminate some rather weird handling of magic
environment variables in kdb.
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEELzVBU1D3lWq6cKzwfOMlXTn3iKEFAl7WfPsACgkQfOMlXTn3
iKGhvBAAmalPhPvJ74djkSfSuz+fNVgjer5wKGQNhz4lSd+0W3lCkY8T2fkUIpL5
jR3Q0gzJSA2WMSA7RrIwegDt0kCiQI0rtRKDkQxo33HBVSLlh2p5oXg7P5lQ4uOi
QZyPI176V1KncFZjPKK2HzhTjoPNlx8GqVys6PBQETvTvxKR3f9qoq5qOKl/f9kQ
Q4Dzb/npl6/XGJnQfdnkRcrXXtlK08yRxfXQyBEv0X6U9PUe1xmEZb1i9WBrrOYv
u6N94fy2z6vqRgnbv4F6FTiQEHR1VFW2nPGpJ6GFv3KGFpT4QSWuyqTjm1Biee2y
Gjn5ACAhW6tdPL+tCK3MRNGih7MaKoR01SnXz5D4T9V1zFTOhW7vyw+t3zoLfR7R
fJoymQWKyfWbtj0Do8POiF31V+hvGVuqhzG/lTpnynSRJL38x4il6sFmtuRxMW+8
vyxaetrPX+omf+fq1ueYTJS5Y5bl1Zp3avajD3VPXq2Vq2m4zl++AOlzTOJDF5A+
P9RbwfWJh5Tm3VdCCWv849IDCK3R15DjoNLsuJkNRzqAYrJMVjA/QWyIAT14KR3z
Nx3ix/QVKFkNnP5g1N38i2AvWRWZ/QuAmAFRgsmgnYPapeeX4EPtgdmqnloV9AAx
CgO7KgUJF4LSIKTfoeWNJ4mpgSVR8zxkOR9w6DX0EQHDbfwlx8o=
=uLAB
-----END PGP SIGNATURE-----
Merge tag 'kgdb-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
Pull kgdb updates from Daniel Thompson:
"By far the biggest change in this cycle are the changes that allow
much earlier debug of systems that are hooked up via UART by taking
advantage of the earlycon framework to implement the kgdb I/O hooks
before handing over to the regular polling I/O drivers once they are
available. When discussing Doug's work we also found and fixed an
broken raw_smp_processor_id() sequence in in_dbg_master().
Also included are a collection of much smaller fixes and tweaks: a
couple of tweaks to ged rid of doc gen or coccicheck warnings, future
proof some internal calculations that made implicit power-of-2
assumptions and eliminate some rather weird handling of magic
environment variables in kdb"
* tag 'kgdb-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux:
kdb: Remove the misfeature 'KDBFLAGS'
kdb: Cleanup math with KDB_CMD_HISTORY_COUNT
serial: amba-pl011: Support kgdboc_earlycon
serial: 8250_early: Support kgdboc_earlycon
serial: qcom_geni_serial: Support kgdboc_earlycon
serial: kgdboc: Allow earlycon initialization to be deferred
Documentation: kgdboc: Document new kgdboc_earlycon parameter
kgdb: Don't call the deinit under spinlock
kgdboc: Disable all the early code when kgdboc is a module
kgdboc: Add kgdboc_earlycon to support early kgdb using boot consoles
kgdboc: Remove useless #ifdef CONFIG_KGDB_SERIAL_CONSOLE in kgdboc
kgdb: Prevent infinite recursive entries to the debugger
kgdb: Delay "kgdbwait" to dbg_late_init() by default
kgdboc: Use a platform device to handle tty drivers showing up late
Revert "kgdboc: disable the console lock when in kgdb"
kgdb: Disable WARN_CONSOLE_UNLOCKED for all kgdb
kgdb: Return true in kgdb_nmi_poll_knock()
kgdb: Drop malformed kernel doc comment
kgdb: Fix spurious true from in_dbg_master()
Pull parsic updates from Helge Deller:
"Enable the sysctl file interface for panic_on_stackoverflow for
parisc, a warning fix and a bunch of documentation updates since the
parisc website is now at https://parisc.wiki.kernel.org"
* 'parisc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: MAINTAINERS: Update references to parisc website
parisc: module: Update references to parisc website
parisc: hardware: Update references to parisc website
parisc: firmware: Update references to parisc website
parisc: Kconfig: Update references to parisc website
parisc: add sysctl file interface panic_on_stackoverflow
parisc: use -fno-strict-aliasing for decompressor
parisc: suppress error messages for 'make clean'
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the generic
PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl7WK54aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAbjA/9EEFeqNg9UNUH6/TS18QV
qkxKp0+LC4Jk+SduzLyYsYy6l/dSaKYl8m9jyJsWjM6BvBZTcMJJOnzIPRafI0s+
MK8GCSZunAkm25DsDvfobQUkbQ/UHjY/fuRpNslbDcsYqIKv90hUMd21ccXY6KC5
RY+aMlpjgksg1X8JJ7k1Rs05sXyUPqpESteyqehF1b/+Iyv7H2L3v5EvQwvPDs6f
TyVgNJU2B3RCU6/uAcWmHdVLxXd+Y8fM0vC8DCO0pg0rGf4be0FbZztHmeq6r2wy
g7wsO7acKWGzulFQD5ftVSQ6i8KHIDNDePmDMtU5oFcXkzUDdGvd3j3Gst19/nve
ZftNmQHOY1JqGUOhdq1fDG/4M3Vc5bvh3W6eMG22TuMLEWsOF8teY8uUa/vxOb+B
2NsJ9q6ylRS7RDWWOrApJWfFYPvhr5wlLxT+azWNa9y3bjV8vDLjNdU0mRLA1nsu
yLzYMwIhtWfZhkJZ+xJVSmQ6LjAHDN5TF/LEx/9itLg5t9wrEosFPAtOv8V15hy4
KBNvvWeoy7RRmBTNuKh7r9Ui4jw7GgxL4D1OwzCsF//GAiGyuuh0zMuUE8EXA6K5
MpdGt+bSOcLl8ILTtGir8e4MXLawDH8n94f8QWLb9FcOvU4KHUjRKU7EQ6dyD5dk
a7xskGLXWdVO3IJ/Xvxcaeo=
=eAtN
-----END PGP SIGNATURE-----
Merge tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the
generic PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
* tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
MIPS: ralink: drop ralink_clk_init for mt7621
MIPS: ralink: bootrom: mark a function as __init to save some memory
MIPS: Loongson64: Reorder CPUCFG model match arms
MIPS: Expose Loongson CPUCFG availability via HWCAP
MIPS: Loongson64: Guard against future cores without CPUCFG
MIPS: Fix build warning about "PTR_STR" redefinition
MIPS: Loongson64: Remove not used pci.c
MIPS: Loongson64: Define PCI_IOBASE
MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
MIPS: DTS: Fix build errors used with various configs
MIPS: Loongson64: select NO_EXCEPT_FILL
MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
MIPS: mm: add page valid judgement in function pte_modify
mm/memory.c: Add memory read privilege on page fault handling
mm/memory.c: Update local TLB if PTE entry exists
MIPS: Do not flush tlb page when updating PTE entry
MIPS: ingenic: Default to a generic board
MIPS: ingenic: Add support for GCW Zero prototype
MIPS: ingenic: DTS: Add memory info of GCW Zero
MIPS: Loongson64: Switch to generic PCI driver
...
Pull ia64 build regression fix from Al Viro:
"Fix a braino in ia64 uaccess csum changes"
* 'uaccess.csum' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a braino in ia64 uaccess csum changes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXtYhfgAKCRCRxhvAZXjc
oghSAP9uVX3vxYtEtNvu9WtEn1uYZcSKZoF1YrcgY7UfSmna0gEAruzyZcai4CJL
WKv+4aRq2oYk+hsqZDycAxIsEgWvNg8=
=ZWj3
-----END PGP SIGNATURE-----
Merge tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull thread updates from Christian Brauner:
"We have been discussing using pidfds to attach to namespaces for quite
a while and the patches have in one form or another already existed
for about a year. But I wanted to wait to see how the general api
would be received and adopted.
This contains the changes to make it possible to use pidfds to attach
to the namespaces of a process, i.e. they can be passed as the first
argument to the setns() syscall.
When only a single namespace type is specified the semantics are
equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET)
equals setns(pidfd, CLONE_NEWNET).
However, when a pidfd is passed, multiple namespace flags can be
specified in the second setns() argument and setns() will attach the
caller to all the specified namespaces all at once or to none of them.
Specifying 0 is not valid together with a pidfd. Here are just two
obvious examples:
setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET);
setns(pidfd, CLONE_NEWUSER);
Allowing to also attach subsets of namespaces supports various
use-cases where callers setns to a subset of namespaces to retain
privilege, perform an action and then re-attach another subset of
namespaces.
Apart from significantly reducing the number of syscalls needed to
attach to all currently supported namespaces (eight "open+setns"
sequences vs just a single "setns()"), this also allows atomic setns
to a set of namespaces, i.e. either attaching to all namespaces
succeeds or we fail without having changed anything.
This is centered around a new internal struct nsset which holds all
information necessary for a task to switch to a new set of namespaces
atomically. Fwiw, with this change a pidfd becomes the only token
needed to interact with a container. I'm expecting this to be
picked-up by util-linux for nsenter rather soon.
Associated with this change is a shiny new test-suite dedicated to
setns() (for pidfds and nsfds alike)"
* tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/pidfd: add pidfd setns tests
nsproxy: attach to namespaces via pidfds
nsproxy: add struct nsset
- Optimize the task wakeup CPU selection logic, to improve scalability and
reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7WPL0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i0ThAAs0fbvMzNJ5SWFdwOQ4KZIlA+Im4dEBMK
sx/XAZqa/hGxvkm1jS0RDVQl1V1JdOlru5UF4C42ctnAFGtBBHDriO5rn9oCpkSw
DAoLc4eZqzldIXN6sDZ0xMtC14Eu15UAP40OyM4qxBc4GqGlOnnale6Vhn+n+pLQ
jAuZlMJIkmmzeA6cuvtultevrVh+QUqJ/5oNUANlTER4OM48umjr5rNTOb8cIW53
9K3vbS3nmqSvJuIyqfRFoMy5GFM6+Jj2+nYuq8aTuYLEtF4qqWzttS3wBzC9699g
XYRKILkCK8ZP4RB5Ps/DIKj6maZGZoICBxTJEkIgXujJlxlKKTD3mddk+0LBXChW
Ijznanxn67akoAFpqi/Dnkhieg7cUrE9v1OPRS2J0xy550synSPFcSgOK3viizga
iqbjptY4scUWkCwHQNjABerxc7MWzrwbIrRt+uNvCaqJLweUh0GnEcV5va8R+4I8
K20XwOdrzuPLo5KdDWA/BKOEv49guHZDvoykzlwMlR3gFfwHS/UsjzmSQIWK3gZG
9OMn8ibO2f1OzhRcEpDLFzp7IIj6NJmPFVSW+7xHyL9/vTveUx3ZXPLteb2qxJVP
BYPsduVx8YeGRBlLya0PJriB23ajQr0lnHWo15g0uR9o/0Ds1ephcymiF3QJmCaA
To3CyIuQN8M=
=C2OP
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve
scalability and reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it
with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements"
* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
sched: Replace rq::wake_list
sched: Add rq::ttwu_pending
irq_work, smp: Allow irq_work on call_single_queue
smp: Optimize send_call_function_single_ipi()
smp: Move irq_work_run() out of flush_smp_call_function_queue()
smp: Optimize flush_smp_call_function_queue()
sched: Fix smp_call_function_single_async() usage for ILB
sched/core: Offload wakee task activation if it the wakee is descheduling
sched/core: Optimize ttwu() spinning on p->on_cpu
sched: Defend cfs and rt bandwidth quota against overflow
sched/cpuacct: Fix charge cpuacct.usage_sys
sched/fair: Replace zero-length array with flexible-array
sched/pelt: Sync util/runnable_sum with PELT window when propagating
sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
sched/fair: Optimize enqueue_task_fair()
sched: Make scheduler_ipi inline
sched: Clean up scheduler_ipi()
sched/core: Simplify sched_init()
...
- Add TPAUSE based delay which allows the CPU to enter an optimized power
state while waiting for the delay to pass. The delay is based on TSC
cycles.
- Add tsc_early_khz command line parameter to workaround the problem that
overclocked CPUs can report the wrong frequency via CPUID.16h which
causes the refined calibration to fail because the delta to the initial
frequency value is too big. With the parameter users can provide an
halfways accurate initial value.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7XvMITHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQ59EACWOU2E+S/b+AqKoZRAJWbTASmu2jEU
4AukhjO3A0y+G3EqnCtvQbUbKkthScSmrDJs2Dt8CTO6q3Fqv/f5JgoubgSx9Hbj
pF1hvueOvRBpinzGEJbDbv+HbkoCYr10DZ5dZ8uz120pSnlfSNNpgZ6hJkOFaUHu
nwVEJpkg2x3ZsiJrgyOfdorwbxO5dCNY9YVL3jyVXUi5QfP3lYrr3/Nz6daIRtRn
Q9tj48N4Bk4ASgmg4rSdXd6OKeZ3Oz1nerol5vFvBeaOc8PVcKSu5sSqMIHHUV2M
RJq8T4nW5Y4pkYjpdYP7Pr/3HYbSNW6eU+MycfnJOzYYTIQfFWkG2wHDNuOg/v+A
GC/grS6wNBj/+tZlvWTwLPf44h7V+sowzYPHBWounT/5drFZ+xsm8+Je4s2NtNih
rbG/4oOQ2jn05PNBCCOyLuP33efQ3ub2UHPCoUxckMiX2eqI+iWpdllZLSiSADZY
jlbXgTQ/Fa3nGKVYVDi1GYbx1rBr/HbsbgvGV4D802s7inmev0azrbgc/CECrnvO
rEa501Y1xzxZ7Zet0QvLK/7aKP532pCmgZiBSmcnS73FBbnssvNJiHlAeq4NHtN2
TsaGYLy0iPSj7siXEaeysUKRjUTNNrgRvtWfo35GDjWahgXhixIvVVpwxnWws5cj
aNR5FwxnI03V2A==
=199V
-----END PGP SIGNATURE-----
Merge tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 timer updates from Thomas Gleixner:
"X86 timer specific updates:
- Add TPAUSE based delay which allows the CPU to enter an optimized
power state while waiting for the delay to pass. The delay is based
on TSC cycles.
- Add tsc_early_khz command line parameter to workaround the problem
that overclocked CPUs can report the wrong frequency via CPUID.16h
which causes the refined calibration to fail because the delta to
the initial frequency value is too big. With the parameter users
can provide an halfways accurate initial value"
* tag 'x86-timers-2020-06-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Add tsc_early_khz command line parameter
x86/delay: Introduce TPAUSE delay
x86/delay: Refactor delay_mwaitx() for TPAUSE support
x86/delay: Preparatory code cleanup
- Not a single new clocksource or clockevent driver!
- Device tree updates for various chips
- Fixes and improvements and cleanups all over the place
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7WEEkTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZxrD/dy2y712euXcZLgclXXQYLDnlRyqLN2
t/6XMeoVulah8UAHccCL+RjLLDHp2pJl1yGPe730GJe9Jxv9+ZVUtXak0VoIoQbO
HpNdNLD/Clq2jv1mgoXh1gb5xnEkQsAWcIFAyH4frB+mWZaOhpwqf51JT6QuwnPf
OAq09gGdRogslXKuliEuRgbHL/a6WjC5IG4gzzB/coxYPnjGpBc8eYY+QbISqALD
D84KJaNu/rOTwh8fXawBmFz5GwSk6LiGkh+Br/vSodYh3T23SWjVPJRsNX0UVT/S
m1tEx2njKBBzmyYjrCU2tF0cwdJ0AduARrCFPB11nrsldp4E36fuoRjs+bW5HZAO
JDajzHrPunYMN3pmh/D5Qk4/4v4XpPFluybqaoBIAee9L/8ulNKUMrEmjJHbHe+N
5pSmkUsxciRmRRyjfMK3xD8K1/P/J+dJnCY3qOEbDZ7O4nei7wdemY9awCIRdVWb
vJYWZwkbGFEsicEUjGgBSqesWuSg+pJRC896PQboonarZ4DZFEZ3B7sODNIrzb82
78nhJ/IGYNuJ4eL7yrxmNDllCOava4vwV87tedbWgAtguE4sDLljKb/V3abwOAOt
ZcNvZLCkMiXKdKKEbVJu/WRtQ+lvO1HNM4yB6S6I4R3kegWZu3zqsUL/xzDhk7O5
lrj0DKbKzUqn
=FFm0
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"The truly boring timer and clocksource updates for 5.8:
- Not a single new clocksource or clockevent driver!
- Device tree updates for various chips
- Fixes and improvements and cleanups all over the place"
* tag 'timers-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
dt-bindings: timer: Add renesas,em-sti bindings
clocksource/drivers/timer-versatile: Clear OF_POPULATED flag
clocksource: mips-gic-timer: Mark GIC timer as unstable if ref clock changes
clocksource: mips-gic-timer: Register as sched_clock
clocksource: dw_apb_timer_of: Fix missing clockevent timers
clocksource: dw_apb_timer: Affiliate of-based timer with any CPU
clocksource: dw_apb_timer: Make CPU-affiliation being optional
dt-bindings: timer: Move snps,dw-apb-timer DT schema from rtc
dt-bindings: rtc: Convert snps,dw-apb-timer to DT schema
clocksource/drivers/timer-ti-dm: Do one override clock parent in prepare()
clocksource/drivers/timer-ti-dm: Fix spelling mistake "detectt" -> "detect"
clocksource/drivers/timer-ti-dm: Fix warning for set but not used
clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
clocksource/drivers/timer-ti-32k: Add support for initializing directly
drivers/clocksource/arm_arch_timer: Remove duplicate error message
clocksource/drivers/arc_timer: Remove duplicate error message
clocksource/drivers/rda: drop redundant Kconfig dependency
clocksource/drivers/timer-ti-dm: Fix warning for set but not used
clocksource/drivers/timer-ti-dm: Add clockevent and clocksource support
clocksource/drivers/timer-ti-32k: Add support for initializing directly
...
- Cleanup of the irq_domain API
- Overhaul of the interrupt chip simulator
- The usual pile of new interrupt chip drivers
- Cleanups, improvements and fixes all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl7WDy0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoU8gD/9kxzl/q7oZ0lnsbtORSZ68hZ+n9rwH
XkLXZVCqOpXgU0Wbe8TLmSUQfxJeMBhTqI6gxkyLWFduRKBP30ChKHbAcvRV9Xj4
s3sdkypiOgilu3skGRm0cUXDPDrACJW0QzZK0TIxVEou5j51opeM+tWb+pGbr7ao
UaKARUsVzcNhYs0K3LoM2NrfXL2NtjPkGM6CE4sx7CaEx73dJyJZHgakleLEAGAo
4ok7A71eB4DHRkeiwAPRLUzaBwGFJT86034+FvFCqwouW0FV40TpKi94Mf+a9IQv
H8HFIUwfTCvfsouzmHYUfqalmShUMGqNOU4qbHeKzf2Nf/OASTOPiQ3ejWEWMkr0
IHSjVzIeE+66RyMe2gkEmQnehrWeR9Djc7yZheo03IuO2oP3YQlpXvyA95hN7T1J
kZ2MVgjNaB7bxQ6eJyA6/xQxv4EJI0a/jQ0ojd1jnomCyRB7k40s9GU2etDmgr08
lA5z8+n9w1lbPWhaZl0BITE2mNoz/b+W+MTiRgN+CqTEXWK7Ee/otoYnZf5nkJpk
0ixxwMvREFoO9JhnxlfG1knQXmL2mLx5I3TGrz7rpx9Ycgoer1hmDWCdMFQaAADL
aVVnw3w8G8cwfqBkBYFHW6hq6228uicutcr22/klXsHnN6OYTHxVRvMb8HP3R7AJ
ExGk4kquk9DhCQ==
=Pi5h
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The generic interrupt departement provides:
- Cleanup of the irq_domain API
- Overhaul of the interrupt chip simulator
- The usual pile of new interrupt chip drivers
- Cleanups, improvements and fixes all over the place"
* tag 'irq-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
irqchip: Fix "Loongson HyperTransport Vector support" driver build on all non-MIPS platforms
dt-bindings: interrupt-controller: Add Loongson PCH MSI
irqchip: Add Loongson PCH MSI controller
dt-bindings: interrupt-controller: Add Loongson PCH PIC
irqchip: Add Loongson PCH PIC controller
dt-bindings: interrupt-controller: Add Loongson HTVEC
irqchip: Add Loongson HyperTransport Vector support
genirq: Check irq_data_get_irq_chip() return value before use
irqchip/sifive-plic: Improve boot prints for multiple PLIC instances
irqchip/sifive-plic: Setup cpuhp once after boot CPU handler is present
irqchip/sifive-plic: Set default irq affinity in plic_irqdomain_map()
irqchip/gic-v2, v3: Drop extra IRQ_NOAUTOEN setting for (E)PPIs
irqdomain: Allow software nodes for IRQ domain creation
irqdomain: Get rid of special treatment for ACPI in __irq_domain_add()
irqdomain: Make __irq_domain_add() less OF-dependent
iio: dummy_evgen: Fix use after free on error in iio_dummy_evgen_create()
irqchip/gic-v3-its: Balance initial LPI affinity across CPUs
irqchip/gic-v3-its: Track LPI distribution on a per CPU basis
genirq/irq_sim: Simplify the API
irqdomain: Make irq_domain_reset_irq_data() available to non-hierarchical users
...
- Convert to use the new mount apis;
- Some random cleanup patches.
-----BEGIN PGP SIGNATURE-----
iIsEABYIADMWIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCXtbfOhUcaHNpYW5na2Fv
QHJlZGhhdC5jb20ACgkQOTcx3B+15gTvZgD6Ap8mYxRaW7Qta+HEyFuyRrxWZ/XZ
pq/hYiouGosDdaMBAOUNl8pGlPX54T+Y9VZv0wV0Dp4pan6NApdgtL9fIQUE
=QhQh
-----END PGP SIGNATURE-----
Merge tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"The most interesting part is the new mount api conversion, which is
actually a old patch already pending for several cycles. And the
others are recent trivial cleanups here.
Summary:
- Convert to use the new mount apis
- Some random cleanup patches"
* tag 'erofs-for-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: suppress false positive last_block warning
erofs: convert to use the new mount fs_context api
erofs: code cleanup by removing ifdef macro surrounding
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl7U50AACgkQxWXV+ddt
WDtK1g//RXeNsTguYQr1N9R5eUPThjLEI0+4J0l4SYfCPU8Ou3C7nqpOEJJQgm8F
ezZE+16cWi9U5uGueOc+w0rfyz4AuIXKgzoz+c0/GG2+yV5jp6DsAMbWqojAb96L
V/N3HxEzR66jqwgVUBE/x5okb2SyY7//B1l/O0amc66XDO7KTMImpIwThere6zWZ
o2SNpYpHAPQeUYJQx8h+FAW3w1CxrCZmnifazU9Jqe9J7QeQLg7rbUlJDV38jySm
ZOA8ohKN9U1gPZy+dTU3kdyyuBIq1etkIaSPJANyTo5TczPKiC0IMg75cXtS4ae/
NSxhccMpSIjVMcIHARzSFGYKNP3sGNRsmaTUg/2Cx/9GoHOhYMiCAVc8qtBBpwJO
UI0siexrCe64RuTBMRRc128GdFv7IjmSImcdi8xaR62bCcUiNdEa3zvjRe/9tOEH
ET7Z85oBnKpSzpC3MdhSUU4dtHY5XLawP8z3oUU1VSzSWM2DVjlHf79/VzbOfp18
miCVpt94lCn/gUX7el6qcnbuvMAjDyeC6HmfD+TwzQgGwyV6TLgKN9lRXeH/Oy6/
VgjGQSavGHMll3zIGURmrBCXKudjJg0J+IP4wN1TimmSEMfwKH+7tnekQd8y5qlF
eXEIqlWNykKeDzEnmV9QJy+/cV83hVWM/mUslcTx39tLN/3B/Us=
=qTt8
-----END PGP SIGNATURE-----
Merge tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Highlights:
- speedup dead root detection during orphan cleanup, eg. when there
are many deleted subvolumes waiting to be cleaned, the trees are
now looked up in radix tree instead of a O(N^2) search
- snapshot creation with inherited qgroup will mark the qgroup
inconsistent, requires a rescan
- send will emit file capabilities after chown, this produces a
stream that does not need postprocessing to set the capabilities
again
- direct io ported to iomap infrastructure, cleaned up and simplified
code, notably removing last use of struct buffer_head in btrfs code
Core changes:
- factor out backreference iteration, to be used by ordinary
backreferences and relocation code
- improved global block reserve utilization
* better logic to serialize requests
* increased maximum available for unlink
* improved handling on large pages (64K)
- direct io cleanups and fixes
* simplify layering, where cloned bios were unnecessarily created
for some cases
* error handling fixes (submit, endio)
* remove repair worker thread, used to avoid deadlocks during
repair
- refactored block group reading code, preparatory work for new type
of block group storage that should improve mount time on large
filesystems
Cleanups:
- cleaned up (and slightly sped up) set/get helpers for metadata data
structure members
- root bit REF_COWS got renamed to SHAREABLE to reflect the that the
blocks of the tree get shared either among subvolumes or with the
relocation trees
Fixes:
- when subvolume deletion fails due to ENOSPC, the filesystem is not
turned read-only
- device scan deals with devices from other filesystems that changed
ownership due to overwrite (mkfs)
- fix a race between scrub and block group removal/allocation
- fix long standing bug of a runaway balance operation, printing the
same line to the syslog, caused by a stale status bit on a reloc
tree that prevented progress
- fix corrupt log due to concurrent fsync of inodes with shared
extents
- fix space underflow for NODATACOW and buffered writes when it for
some reason needs to fallback to COW mode"
* tag 'for-5.8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (133 commits)
btrfs: fix space_info bytes_may_use underflow during space cache writeout
btrfs: fix space_info bytes_may_use underflow after nocow buffered write
btrfs: fix wrong file range cleanup after an error filling dealloc range
btrfs: remove redundant local variable in read_block_for_search
btrfs: open code key_search
btrfs: split btrfs_direct_IO to read and write part
btrfs: remove BTRFS_INODE_READDIO_NEED_LOCK
fs: remove dio_end_io()
btrfs: switch to iomap_dio_rw() for dio
iomap: remove lockdep_assert_held()
iomap: add a filesystem hook for direct I/O bio submission
fs: export generic_file_buffered_read()
btrfs: turn space cache writeout failure messages into debug messages
btrfs: include error on messages about failure to write space/inode caches
btrfs: remove useless 'fail_unlock' label from btrfs_csum_file_blocks()
btrfs: do not ignore error from btrfs_next_leaf() when inserting checksums
btrfs: make checksum item extension more efficient
btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents
btrfs: unexport btrfs_compress_set_level()
btrfs: simplify iget helpers
...
- Introduce DONTCACHE flags for dentries and inodes. This hint will
cause the VFS to drop the associated objects immediately after the
last put, so that we can change the file access mode (DAX or page
cache) on the fly.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl68FowACgkQ+H93GTRK
tOtNzA/9FkXXQYAlTWK/toHfJV8DQT/Kx1fvf8Ng0EphBUQa/rNzlcMzFg7Gw5Cs
Rzis96+xj4q//iseLZN5LLxaoxqT2Qipza0GWCMJpQG/4wTWM0Ar7BnG/Vc87lUV
F0mXnILZOUMFzr8Zj9q4ka6UGRTDSXXtwNXqBuPpIZyVbMQvPtXHhM3lWV5RUQwm
fznBxDAEGoVXiyID2OrZD5tS4BMd16uFWAWLjWphpcy18zfC7zp0+0MQik4v/9oi
54pZdtPT9/dQOu/BI8tfLP45XzZ6f++gXy2p/G96dy7ism1u40ML77ojEkadVVFe
Bf7t+EswNxrx/em/ugWbcJDtrxttSqU47g2AXsbJJB2+aHCih6Cfid41lMyRvlhR
d4cumoteX7IF/PpT3YaKHWQBo5OxHK0a2CBPd6czrCBw5yXrEUagdmw1XQ//bw5e
FRCg4eMcEW0UgINvBCHWdWRx6VaL8ngMMsflVJ/lY7FeVvM10ZYRFzJoryoebSPm
/yWcoHFsTPC8K0nWVmbwPazVE19I0g4y6Wiw39YvZDzZRzM9PcQI4DBxQcab+Va/
FPfXEXkpz0GiC6zjs/QfkPtg60GI1IG5Um4JUzdv6ce1P0p1rGcu5WiNYearahE7
7V/44WGIEAd4NP7R0JPTI0Fqv7v6uuDzMoCp7YDn8gE4FCJTt6M=
=ebl3
-----END PGP SIGNATURE-----
Merge tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part two from Darrick Wong:
"This time around, we're hoisting the DONTCACHE flag from XFS into the
VFS so that we can make the incore DAX mode changes become effective
sooner.
We can't change the file data access mode on a live inode because we
don't have a safe way to change the file ops pointers. The incore
state change becomes effective at inode loading time, which can happen
if the inode is evicted. Therefore, we're making it so that
filesystems can ask the VFS to evict the inode as soon as the last
holder drops.
The per-fs changes to make this call this will be in subsequent pull
requests from Ted and myself.
Summary:
- Introduce DONTCACHE flags for dentries and inodes. This hint will
cause the VFS to drop the associated objects immediately after the
last put, so that we can change the file access mode (DAX or page
cache) on the fly"
* tag 'vfs-5.8-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs: Introduce DCACHE_DONTCACHE
fs: Lift XFS_IDONTCACHE to the VFS layer
- Clean up io_is_direct.
- Add a new statx flag to indicate when file data access is being done
via DAX (as opposed to the page cache).
- Update the documentation for how system administrators and application
programmers can take advantage of the (still experimental DAX) feature.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl6wO7UACgkQ+H93GTRK
tOtEaw//eShC2YE0S+GS7ihQ3x71PJa4Is0VZOIpTHl01aqMSegwB3QbDQbVUhn1
TzLhUw4pZsz3R9GbUOrfHOYRt+aSP2t0WNhIulDeBp41CYJQSaFt85KnfM9hoBOi
VYssum3Lu7/6ReKrDD/mumzWYkts+JDCuXRmt7nOQeZJVNXOCBBbvN354V4/IKLY
wB4Wnaq3f3gYniXYW/23aCX+kocaOIUZtK6aFKyeD0KvfP5toDlpw1cBVMoM9CmO
bmEy8vKf4lgFZDLeDMqmWOecMgEH5h0baN5Psu13WuDCiCd6maBl0KpxVpVlwsep
yVz6mMbZjmLOJ2lqyw+lZb+XicD+K3yRVSTGKxV3VbuRjeX9tjVG5Im13VesNvJB
WWJq/CkOU8W0Zs7Q5RbUDGbFFWDJSI/OStAU+UeuWvL9Gndv7hqv6H904qbPPtEu
4m4Y34ARzrEaKpkABKKwQ53cLClNxmmgUN9N3cXK3mk8idlX4zM3j6+HJYUxXTO+
fBjhOlyUy2KaWmzZoJp28QvaU4iegGmMSuRnQ9HAvXmdxUA2K6+wjS6LCZGh04vz
z7SbzTBlo2kvsKdRMwJ306s2QA0/HvmKHHLI+p8OQANce9hjhE3XdJayzhitd0fk
k0D/y8OY+fbCSgI8C4g66lA8Zf2sos/ulD0QTTNBPfU2rRWkKUc=
=rS19
-----END PGP SIGNATURE-----
Merge tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull DAX updates part one from Darrick Wong:
"After many years of LKML-wrangling about how to enable programs to
query and influence the file data access mode (DAX) when a filesystem
resides on storage devices such as persistent memory, Ira Weiny has
emerged with a proposed set of standard behaviors that has not been
shot down by anyone! We're more or less standardizing on the current
XFS behavior and adapting ext4 to do the same.
This is the first of a handful pull requests that will make ext4 and
XFS present a consistent interface for user programs that care about
DAX. We add a statx attribute that programs can check to see if DAX is
enabled on a particular file. Then, we update the DAX documentation to
spell out the user-visible behaviors that filesystems will guarantee
(until the next storage industry shakeup). The on-disk inode flag has
been in XFS for a few years now.
Summary:
- Clean up io_is_direct.
- Add a new statx flag to indicate when file data access is being
done via DAX (as opposed to the page cache).
- Update the documentation for how system administrators and
application programmers can take advantage of the (still
experimental DAX) feature"
Link: https://lore.kernel.org/lkml/20200505002016.1085071-1-ira.weiny@intel.com/
* tag 'vfs-5.8-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
Documentation/dax: Update Usage section
fs/stat: Define DAX statx attribute
fs: Remove unneeded IS_DAX() check in io_is_direct()
- Various cleanups to remove dead code, unnecessary conditionals,
asserts, etc.
- Fix a linker warning caused by xfs stuffing '-g' into CFLAGS
redundantly.
- Tighten up our dmesg logging to ensure that everything is prefixed
with 'XFS' for easier grepping.
- Kill a bunch of typedefs.
- Refactor the deferred ops code to reduce indirect function calls.
- Increase type-safety with the deferred ops code.
- Make the DAX mount options a tri-state.
- Fix some error handling problems in the inode flush code and clean up
other inode flush warts.
- Refactor log recovery so that each log item recovery functions now live
with the other log item processing code.
- Fix some SPDX forms.
- Fix quota counter corruption if the fs crashes after running
quotacheck but before any dquots get logged.
- Don't fail metadata verification on zero-entry attr leaf blocks, since
they're just part of the disk format now due to a historic lack of log
atomicity.
- Don't allow SWAPEXT between files with different [ugp]id when quotas
are enabled.
- Refactor inode fork reading and verification to run directly from the
inode-from-disk function. This means that we now actually guarantee
that _iget'ted inodes are totally verified and ready to go.
- Move the incore inode fork format and extent counts to the ifork
structure.
- Scalability improvements by reducing cacheline pingponging in
struct xfs_mount.
- More scalability improvements by removing m_active_trans from the
hot path.
- Fix inode counter update sanity checking to run /only/ on debug
kernels.
- Fix longstanding inconsistency in what error code we return when a
program hits project quota limits (ENOSPC).
- Fix group quota returning the wrong error code when a program hits
group quota limits.
- Fix per-type quota limits and grace periods for group and project
quotas so that they actually work.
- Allow extension of individual grace periods.
- Refactor the non-reclaim inode radix tree walking code to remove a
bunch of stupid little functions and straighten out the
inconsistent naming schemes.
- Fix a bug in speculative preallocation where we measured a new
allocation based on the last extent mapping in the file instead of
looking farther for the last contiguous space allocation.
- Force delalloc writes to unwritten extents. This closes a
stale disk contents exposure vector if the system goes down before
the write completes.
- More lockdep whackamole.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAl7OjhgACgkQ+H93GTRK
tOuGeBAApuP9ohtvrJT9FW7U+OrRsK3lw/3R+MEYpJu8GKLpGbJ6j+SKrTHxxLvu
Rp63YLIlHBOz2rNa4brm/wW8gGJIGXOnGpuiGq0Irl01xEmwqmjOLfLcYkYhno1E
i+rG0PiKYZeo/xhLtTKGl+NAwHHxmbOmxUtYHnbinHtPzDyYLQ0wff+oUkmQ7ydg
bMYFMXohoJ3Pc5UjmUrCuJj1cvYOUwl0P4LGKiq5Zud61AkBCSskEpk+oo5xFcEX
JJc1xkn5MPi+oGpSYqhnSZ6aSjwp53/i44O9volp5vCRXXv1eLVni2u/ScZ85L72
HXxoDyuZOUupirIfMBQFHsazDGPGyFIqtPhGlXoTJjrwX+ymimY6CU/0e+Xu9DEu
krlxajfUssH30zyG2q/2TaxslU35CROH6hVBXFe0Y5cEEsOIf2aOpErUhhw2YyS7
onN9gb2NBBQdYtHqIMwsbhcgq60g5H6JfGriB5dJimXXLmpuTfAREGCY2AqIoB1x
+8QFod0WwsMn6FYhi/UpZjC9qp/WTvojBUEt8Ci3ketUFwO1CLf9qm6Hj71RL3fs
fCEDHx/ZMMft7Bdbf36lICoMAhF/KfNcRn1PsQdpW4LY1Aml/7qjFNZthSVRDW+E
rhzNu+RIzGEQsSemBvccRaaTP3HFqN+qPATu2K0sALaa1LRFxzQ=
=/NYc
-----END PGP SIGNATURE-----
Merge tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs updates from Darrick Wong:
"Most of the changes this cycle are refactoring of existing code in
preparation for things landing in the future.
We also fixed various problems and deficiencies in the quota
implementation, and (I hope) the last of the stale read vectors by
forcing write allocations to go through the unwritten state until the
write completes.
Summary:
- Various cleanups to remove dead code, unnecessary conditionals,
asserts, etc.
- Fix a linker warning caused by xfs stuffing '-g' into CFLAGS
redundantly.
- Tighten up our dmesg logging to ensure that everything is prefixed
with 'XFS' for easier grepping.
- Kill a bunch of typedefs.
- Refactor the deferred ops code to reduce indirect function calls.
- Increase type-safety with the deferred ops code.
- Make the DAX mount options a tri-state.
- Fix some error handling problems in the inode flush code and clean
up other inode flush warts.
- Refactor log recovery so that each log item recovery functions now
live with the other log item processing code.
- Fix some SPDX forms.
- Fix quota counter corruption if the fs crashes after running
quotacheck but before any dquots get logged.
- Don't fail metadata verification on zero-entry attr leaf blocks,
since they're just part of the disk format now due to a historic
lack of log atomicity.
- Don't allow SWAPEXT between files with different [ugp]id when
quotas are enabled.
- Refactor inode fork reading and verification to run directly from
the inode-from-disk function. This means that we now actually
guarantee that _iget'ted inodes are totally verified and ready to
go.
- Move the incore inode fork format and extent counts to the ifork
structure.
- Scalability improvements by reducing cacheline pingponging in
struct xfs_mount.
- More scalability improvements by removing m_active_trans from the
hot path.
- Fix inode counter update sanity checking to run /only/ on debug
kernels.
- Fix longstanding inconsistency in what error code we return when a
program hits project quota limits (ENOSPC).
- Fix group quota returning the wrong error code when a program hits
group quota limits.
- Fix per-type quota limits and grace periods for group and project
quotas so that they actually work.
- Allow extension of individual grace periods.
- Refactor the non-reclaim inode radix tree walking code to remove a
bunch of stupid little functions and straighten out the
inconsistent naming schemes.
- Fix a bug in speculative preallocation where we measured a new
allocation based on the last extent mapping in the file instead of
looking farther for the last contiguous space allocation.
- Force delalloc writes to unwritten extents. This closes a stale
disk contents exposure vector if the system goes down before the
write completes.
- More lockdep whackamole"
* tag 'xfs-5.8-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (129 commits)
xfs: more lockdep whackamole with kmem_alloc*
xfs: force writes to delalloc regions to unwritten
xfs: refactor xfs_iomap_prealloc_size
xfs: measure all contiguous previous extents for prealloc size
xfs: don't fail unwritten extent conversion on writeback due to edquot
xfs: rearrange xfs_inode_walk_ag parameters
xfs: straighten out all the naming around incore inode tree walks
xfs: move xfs_inode_ag_iterator to be closer to the perag walking code
xfs: use bool for done in xfs_inode_ag_walk
xfs: fix inode ag walk predicate function return values
xfs: refactor eofb matching into a single helper
xfs: remove __xfs_icache_free_eofblocks
xfs: remove flags argument from xfs_inode_ag_walk
xfs: remove xfs_inode_ag_iterator_flags
xfs: remove unused xfs_inode_ag_iterator function
xfs: replace open-coded XFS_ICI_NO_TAG
xfs: move eofblocks conversion function to xfs_ioctl.c
xfs: allow individual quota grace period extension
xfs: per-type quota timers and warn limits
xfs: switch xfs_get_defquota to take explicit type
...
Pull lockdown update from James Morris:
"An update for the security subsystem to allow unprivileged users
to see the status of the lockdown feature. From Jeremy Cline"
Also an added comment to describe CAP_SETFCAP.
* 'next-general' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
capabilities: add description for CAP_SETFCAP
lockdown: Allow unprivileged users to see lockdown status
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnLoUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPkjA//fHFbgHBbiZrS7v/vi61wdpEtGzmn
/hr4Z5DpFmJdCTGeGItST8Xq4KEqlLGrMclk+PsG0H7BMJEEp+0XJ+begqNvC8PF
+JzP+oBqoO0SoF5z0jOnBBtzK8R3vmVgcPO3dNdEgNBQG3T7/GQLUTX8DylBDOI1
yFeuewRD7sK/rIg/S6t+B0ut7Uer5CjEIed4iQZ3eKIUqE6/C1zpmQj98MH9L5uh
yN0tdF8aOZvgD6v1bfmvgAnnODFvvKcogDn+hvbqRhrDdhgt1DAErIjYeqRemQRc
g7Xve4i7VivXC4o8nhUy00FWqzCB5tcydR0cwgg4iR/JgKvn18s0vRQV9SU7Nt+o
pXOex6qHlFCJpjTop+DCBEkGK9V7UBMM6t6gwR/bpkDMYkgIjJrCIQTyw8/HrKKt
fntryXf9juM0Owh/YOp5jKXPddhkfuztViJ+FnxsI2sho643Gg6/Wfy2slvJ0udH
i0bnnacW/6pysf/eLrPsF89IacAGydkhdZwaSno3GLyCtXxrqJU4cs2wSpUq0Wiz
g4kB4hpPXgrQszLriEsF0gRVcRu2nOF4ISXlUqfSw7i/nFT7+axYUjgBg9PpV1Mj
GyLBSOQp1xs4S/oglfJ5nE4UtS4m187t4JVWOxfqqyWE/O2cqUPtaS+52m0aIWTH
6HFWbmL5+Dxsm+4=
=A+bX
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"The highlights:
- A number of improvements to various SELinux internal data
structures to help improve performance. We move the role
transitions into a hash table. In the content structure we shift
from hashing the content string (aka SELinux label) to the
structure itself, when it is valid. This last change not only
offers a speedup, but it helps us simplify the code some as well.
- Add a new SELinux policy version which allows for a more space
efficient way of storing the filename transitions in the binary
policy. Given the default Fedora SELinux policy with the unconfined
module enabled, this change drops the policy size from ~7.6MB to
~3.3MB. The kernel policy load time dropped as well.
- Some fixes to the error handling code in the policy parser to
properly return error codes when things go wrong"
* tag 'selinux-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: netlabel: Remove unused inline function
selinux: do not allocate hashtabs dynamically
selinux: fix return value on error in policydb_read()
selinux: simplify range_write()
selinux: fix error return code in policydb_read()
selinux: don't produce incorrect filename_trans_count
selinux: implement new format of filename transitions
selinux: move context hashing under sidtab
selinux: hash context structure directly
selinux: store role transitions in a hash table
selinux: drop unnecessary smp_load_acquire() call
selinux: fix warning Comparison to bool
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl7VnKEUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXMbHA/+PQmrPdzPvkLAjjf1y3LXvyEIAXIQ
h2r8SxHa7iGyF6vVPz+ya7ux0KAm8wCVdfkokWG5jxjwK7pysS6gx9JzBVK7dbhD
FsKBSoq9+to9fYlaCyX7vn85C7kK5oGrwS/ECos0BHBpij8ukLgvPQu+PDs7d4xW
1X2Nrgqnc7M4L8ayzXTQX0fDWcOkapzaN86+R+Lavb4hO/FownaYbuCFn+1mdzux
ZNBpt3/y1pM6vi5YBkI1rkauBCmkl/YSX/mf/EwDNlQ0XmcadGQ6z7iwjyiE826g
etCHWD3cgQH7Zzz6CxBNX8Xbq0nIQueHHiFYpVyy9lf4xleFvnfFDebrs8Q9TB6G
jTWU8okioUKPZyRDaRuIAmCf/LBQRsMkIYTU3w6J0ZqsBycTw3NXPiQArmlxZESM
HquxWpKoZytRiw581hiSGKNqY+R3FvA+Jroc/7bWfNOE3IdFxegvCsC3giKJf1rY
AlQitehql9a5jp7A57+477WRYOygYRnd+ntMD5KqR90QSIcQXeg0/lFKhco+zc2p
bXbWLE+aaOTGCeC+3Eow3T7FMWmrIn6ccKgM84+WT7YQYtRqUYu3RIZbnlYXN7uH
8xGXT6ccPcEwIjgyF87J0KyGhrbT1N91Jd2jMJkEry9OLAn/yr+pUBQtAa456MMi
JYevS4atZaUqgvw=
=iLfC
-----END PGP SIGNATURE-----
Merge tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"Summary of the significant patches:
- Record information about binds/unbinds to the audit multicast
socket. This helps identify which processes have/had access to the
information in the audit stream.
- Cleanup and add some additional information to the netfilter
configuration events collected by audit.
- Fix some of the audit error handling code so we don't leak network
namespace references"
* tag 'audit-pr-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: add subj creds to NETFILTER_CFG record to
audit: Replace zero-length array with flexible-array
audit: make symbol 'audit_nfcfgs' static
netfilter: add audit table unregister actions
audit: tidy and extend netfilter_cfg x_tables
audit: log audit netlink multicast bind and unbind
audit: fix a net reference leak in audit_list_rules_send()
audit: fix a net reference leak in audit_send_reply()