kernel_optimize_test

History

Yonghong Song 0c9a876f28 bpf: Fix potentially incorrect results with bpf_get_local_storage() commit a2baf4e8bb0f306fbed7b5e6197c02896a638ab5 upstream. Commit b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed a bug for bpf_get_local_storage() helper so different tasks won't mess up with each other's percpu local storage. The percpu data contains 8 slots so it can hold up to 8 contexts (same or different tasks), for 8 different program runs, at the same time. This in general is sufficient. But our internal testing showed the following warning multiple times: [...] warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193 __cgroup_bpf_run_filter_sock_ops+0x13e/0x180 RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180 <IRQ> tcp_call_bpf.constprop.99+0x93/0xc0 tcp_conn_request+0x41e/0xa50 ? tcp_rcv_state_process+0x203/0xe00 tcp_rcv_state_process+0x203/0xe00 ? sk_filter_trim_cap+0xbc/0x210 ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160 tcp_v6_do_rcv+0x181/0x3e0 tcp_v6_rcv+0xc65/0xcb0 ip6_protocol_deliver_rcu+0xbd/0x450 ip6_input_finish+0x11/0x20 ip6_input+0xb5/0xc0 ip6_sublist_rcv_finish+0x37/0x50 ip6_sublist_rcv+0x1dc/0x270 ipv6_list_rcv+0x113/0x140 __netif_receive_skb_list_core+0x1a0/0x210 netif_receive_skb_list_internal+0x186/0x2a0 gro_normal_list.part.170+0x19/0x40 napi_complete_done+0x65/0x150 mlx5e_napi_poll+0x1ae/0x680 __napi_poll+0x25/0x120 net_rx_action+0x11e/0x280 __do_softirq+0xbb/0x271 irq_exit_rcu+0x97/0xa0 common_interrupt+0x7f/0xa0 </IRQ> asm_common_interrupt+0x1e/0x40 RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac ? __cgroup_bpf_run_filter_skb+0x378/0x4e0 ? do_softirq+0x34/0x70 ? ip6_finish_output2+0x266/0x590 ? ip6_finish_output+0x66/0xa0 ? ip6_output+0x6c/0x130 ? ip6_xmit+0x279/0x550 ? ip6_dst_check+0x61/0xd0 [...] Using drgn [0] to dump the percpu buffer contents showed that on this CPU slot 0 is still available, but slots 1-7 are occupied and those tasks in slots 1-7 mostly don't exist any more. So we might have issues in bpf_cgroup_storage_unset(). Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset(). Currently, it tries to unset "current" slot with searching from the start. So the following sequence is possible: 1. A task is running and claims slot 0 2. Running BPF program is done, and it checked slot 0 has the "task" and ready to reset it to NULL (not yet). 3. An interrupt happens, another BPF program runs and it claims slot 1 with the same task. 4. The unset() in interrupt context releases slot 0 since it matches "task". 5. Interrupt is done, the task in process context reset slot 0. At the end, slot 1 is not reset and the same process can continue to occupy slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF program won't be able to claim an empty slot and a warning will be issued. To fix the issue, for unset() function, we should traverse from the last slot to the first. This way, the above issue can be avoided. The same reverse traversal should also be done in bpf_get_local_storage() helper itself. Otherwise, incorrect local storage may be returned to BPF program. [0] https://github.com/osandov/drgn Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>		2021-09-03 10:09:31 +02:00
..
bpf	bpf: Fix potentially incorrect results with bpf_get_local_storage()	2021-09-03 10:09:31 +02:00
cgroup	cgroup1: fix leaked context root causing sporadic NULL deref in LTP	2021-07-31 08:16:11 +02:00
configs
debug	kgdb: fix to kill breakpoints on initmem after boot	2021-03-04 11:38:46 +01:00
dma	dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}	2021-07-28 14:35:38 +02:00
entry	x86/entry: Move nmi entry/exit into common code	2021-03-17 17:06:36 +01:00
events	perf: Fix data race between pin_count increment/decrement	2021-06-16 12:01:45 +02:00
gcov	gcov: re-fix clang-11+ support	2021-04-14 08:41:58 +02:00
irq	genirq/timings: Prevent potential array overflow in __irq_timings_store()	2021-08-18 08:59:15 +02:00
kcsan	kcsan: Fix debugfs initcall return type	2021-05-26 12:06:54 +02:00
livepatch	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
locking	lockdep: Fix wait-type for empty stack	2021-07-14 16:56:10 +02:00
power	PM: EM: postpone creating the debugfs dir till fs_initcall	2021-03-30 14:32:04 +02:00
printk	printk: fix deadlock when kernel panic	2021-03-04 11:38:41 +01:00
rcu	srcu: Provide polling interfaces for Tiny SRCU grace periods	2021-09-03 10:09:30 +02:00
sched	kthread: Fix PF_KTHREAD vs to_kthread() race	2021-09-03 10:09:31 +02:00
time	timers: Move clearing of base::timer_running under base:: Lock	2021-08-12 13:22:15 +02:00
trace	tracing / histogram: Fix NULL pointer dereference on strcmp() on NULL event name	2021-08-26 08:35:54 -04:00
.gitignore	kbuild: update config_data.gz only when the content of .config is changed	2021-05-11 14:47:37 +02:00
acct.c	kernel: acct.c: fix some kernel-doc nits	2020-10-16 11:11:19 -07:00
async.c
audit_fsnotify.c	fsnotify: generalize handle_inode_event()	2020-12-30 11:54:18 +01:00
audit_tree.c	audit: move put_tree() to avoid trim_trees refcount underflow and UAF	2021-09-03 10:09:31 +02:00
audit_watch.c	fsnotify: generalize handle_inode_event()	2020-12-30 11:54:18 +01:00
audit.c	audit: Remove redundant null check	2020-08-26 09:10:39 -04:00
audit.h	audit: change unnecessary globals into statics	2020-08-17 20:26:58 -04:00
auditfilter.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
auditsc.c
backtracetest.c
bounds.c
capability.c	LSM: Signal to SafeSetID when setting group IDs	2020-10-13 09:17:34 -07:00
compat.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c
cpu_pm.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
cpu.c	cpu/hotplug: Cure the cpusets trainwreck	2021-07-19 09:44:59 +02:00
crash_core.c	crash_core, vmcoreinfo: append 'SECTION_SIZE_BITS' to vmcoreinfo	2021-06-23 14:42:52 +02:00
crash_dump.c
cred.c	ucounts: Increase ucounts reference counter before the security hook	2021-09-03 10:09:24 +02:00
delayacct.c
dma.c
exec_domain.c
exit.c	kernel/io_uring: cancel io_uring before task works	2021-01-30 13:55:18 +01:00
extable.c
fail_function.c	fail_function: Remove a redundant mutex unlock	2020-11-19 11:58:16 -08:00
fork.c	sched/core: Initialize the idle task with preemption disabled	2021-07-14 16:55:50 +02:00
freezer.c	Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"	2021-04-07 15:00:14 +02:00
futex.c	mm, futex: fix shared futex pgoff on shmem huge page	2021-06-30 08:47:29 -04:00
gen_kheaders.sh
groups.c	LSM: Signal to SafeSetID when setting group IDs	2020-10-13 09:17:34 -07:00
hung_task.c	kernel/hung_task.c: make type annotations consistent	2020-11-02 12:14:19 -08:00
iomem.c
irq_work.c
jump_label.c	jump_label: Fix jump_label_text_reserved() vs __init	2021-07-20 16:05:58 +02:00
kallsyms.c	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
kcmp.c	exec: Transform exec_update_mutex into a rw_semaphore	2021-01-09 13:46:24 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: make some symbols static	2020-08-12 10:58:02 -07:00
kexec_core.c	kernel: kexec: remove the lock operation of system_transition_mutex	2021-02-03 23:28:37 +01:00
kexec_elf.c
kexec_file.c	kernel: kexec_file: fix error return code of kexec_calculate_store_digests()	2021-05-19 10:13:09 +02:00
kexec_internal.h
kexec.c	LSM: Introduce kernel_post_load_data() hook	2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c	kmod: remove redundant "be an" in the comment	2020-08-12 10:58:01 -07:00
kprobes.c	kprobes: Fix to delay the kprobes jump optimization	2021-03-04 11:38:35 +01:00
ksysfs.c
kthread.c	kthread: Fix PF_KTHREAD vs to_kthread() race	2021-09-03 10:09:31 +02:00
latencytop.c
Makefile	kbuild: update config_data.gz only when the content of .config is changed	2021-05-11 14:47:37 +02:00
module_signature.c	module: harden ELF info handling	2021-03-25 09:04:11 +01:00
module_signing.c	module: harden ELF info handling	2021-03-25 09:04:11 +01:00
module-internal.h
module.c	module: limit enabling module.sig_enforce	2021-06-30 08:47:15 -04:00
notifier.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
nsproxy.c
padata.c	padata: fix possible padata_works_lock deadlock	2020-09-04 17:51:55 +10:00
panic.c	panic: don't dump stack twice on warn	2020-11-14 11:26:04 -08:00
params.c	params: Replace zero-length array with flexible-array member	2020-10-29 17:22:59 -05:00
pid_namespace.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
pid.c	exec: Transform exec_update_mutex into a rw_semaphore	2021-01-09 13:46:24 +01:00
profile.c
ptrace.c	ptrace: make ptrace() fail if the tracee changed its pid unexpectedly	2021-05-26 12:06:49 +02:00
range.c	kernel.h: split out min()/max() et al. helpers	2020-10-16 11:11:19 -07:00
reboot.c	reboot: fix overflow parsing reboot cpu number	2020-11-14 11:26:03 -08:00
regset.c
relay.c	kernel/relay.c: drop unneeded initialization	2020-10-16 11:11:22 -07:00
resource.c	kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources	2021-05-19 10:13:09 +02:00
rseq.c
scftorture.c	scftorture: Add cond_resched() to test loop	2020-08-24 18:38:38 -07:00
scs.c	mm: memcontrol: account kernel stack per node	2020-08-07 11:33:25 -07:00
seccomp.c	seccomp: Fix setting loaded filter count during TSYNC	2021-08-18 08:59:06 +02:00
signal.c	ptrace: fix task_join_group_stop() for the case when current is traced	2020-11-02 12:14:19 -08:00
smp.c	smp: Fix smp_call_function_single_async prototype	2021-05-14 09:50:46 +02:00
smpboot.c	sched/core: Initialize the idle task with preemption disabled	2021-07-14 16:55:50 +02:00
smpboot.h
softirq.c	softirq: Add debug check to __raise_softirq_irqoff()	2020-09-16 15:18:56 +02:00
stackleak.c	stackleak: let stack_erasing_sysctl take a kernel pointer buffer	2020-09-19 13:13:39 -07:00
stacktrace.c	stacktrace: Remove reliable argument from arch_stack_walk() callback	2020-09-18 14:24:16 +01:00
static_call.c	static_call: Fix static_call_text_reserved() vs __init	2021-07-20 16:05:58 +02:00
stop_machine.c	stop_machine, rcu: Mark functions as notrace	2020-10-26 12:12:27 +01:00
sys_ni.c	mm/madvise: introduce process_madvise() syscall: an external memory hinting API	2020-10-18 09:27:10 -07:00
sys.c	Add a reference to ucounts for each cred	2021-07-14 16:55:48 +02:00
sysctl-test.c
sysctl.c	sysctl.c: fix underflow value setting risk in vm_table	2021-03-17 17:06:25 +01:00
task_work.c	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
taskstats.c	taskstats: move specifying netlink policy back to ops	2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c
tracepoint.c	tracepoint: Use rcu get state and cond sync for static call updates	2021-09-03 10:09:30 +02:00
tsacct.c
ucount.c	Add a reference to ucounts for each cred	2021-07-14 16:55:48 +02:00
uid16.c
uid16.h
umh.c	usermodehelper: reset umask to default before executing user process	2020-10-06 10:31:52 -07:00
up.c	smp: Fix smp_call_function_single_async prototype	2021-05-14 09:50:46 +02:00
user_namespace.c	Add a reference to ucounts for each cred	2021-07-14 16:55:48 +02:00
user-return-notifier.c
user.c
usermode_driver.c	bpf: Fix umd memory leak in copy_process()	2021-03-30 14:32:03 +02:00
utsname_sysctl.c
utsname.c
watch_queue.c	watch_queue: Limit the number of watches a user can hold	2020-08-17 09:39:18 -07:00
watchdog_hld.c
watchdog.c	watchdog: fix barriers when printing backtraces from all CPUs	2021-05-19 10:13:00 +02:00
workqueue_internal.h
workqueue.c	workqueue: fix UAF in pwq_unbound_release_workfn()	2021-07-31 08:16:11 +02:00