forked from luck/tmp_suning_uos_patched
ff5039ec75
[ Upstream commit ff40e51043af63715ab413995ff46996ecf9583f ] Commit59438b4647
("security,lockdown,selinux: implement SELinux lockdown") added an implementation of the locked_down LSM hook to SELinux, with the aim to restrict which domains are allowed to perform operations that would breach lockdown. This is indirectly also getting audit subsystem involved to report events. The latter is problematic, as reported by Ondrej and Serhei, since it can bring down the whole system via audit: 1) The audit events that are triggered due to calls to security_locked_down() can OOM kill a machine, see below details [0]. 2) It also seems to be causing a deadlock via avc_has_perm()/slow_avc_audit() when trying to wake up kauditd, for example, when using trace_sched_switch() tracepoint, see details in [1]. Triggering this was not via some hypothetical corner case, but with existing tools like runqlat & runqslower from bcc, for example, which make use of this tracepoint. Rough call sequence goes like: rq_lock(rq) -> -------------------------+ trace_sched_switch() -> | bpf_prog_xyz() -> +-> deadlock selinux_lockdown() -> | audit_log_end() -> | wake_up_interruptible() -> | try_to_wake_up() -> | rq_lock(rq) --------------+ What's worse is that the intention of59438b4647
to further restrict lockdown settings for specific applications in respect to the global lockdown policy is completely broken for BPF. The SELinux policy rule for the current lockdown check looks something like this: allow <who> <who> : lockdown { <reason> }; However, this doesn't match with the 'current' task where the security_locked_down() is executed, example: httpd does a syscall. There is a tracing program attached to the syscall which triggers a BPF program to run, which ends up doing a bpf_probe_read_kernel{,_str}() helper call. The selinux_lockdown() hook does the permission check against 'current', that is, httpd in this example. httpd has literally zero relation to this tracing program, and it would be nonsensical having to write an SELinux policy rule against httpd to let the tracing helper pass. The policy in this case needs to be against the entity that is installing the BPF program. For example, if bpftrace would generate a histogram of syscall counts by user space application: bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' bpftrace would then go and generate a BPF program from this internally. One way of doing it [for the sake of the example] could be to call bpf_get_current_task() helper and then access current->comm via one of bpf_probe_read_kernel{,_str}() helpers. So the program itself has nothing to do with httpd or any other random app doing a syscall here. The BPF program _explicitly initiated_ the lockdown check. The allow/deny policy belongs in the context of bpftrace: meaning, you want to grant bpftrace access to use these helpers, but other tracers on the system like my_random_tracer _not_. Therefore fix all three issues at the same time by taking a completely different approach for the security_locked_down() hook, that is, move the check into the program verification phase where we actually retrieve the BPF func proto. This also reliably gets the task (current) that is trying to install the BPF tracing program, e.g. bpftrace/bcc/perf/systemtap/etc, and it also fixes the OOM since we're moving this out of the BPF helper's fast-path which can be called several millions of times per second. The check is then also in line with other security_locked_down() hooks in the system where the enforcement is performed at open/load time, for example, open_kcore() for /proc/kcore access or module_sig_check() for module signatures just to pick few random ones. What's out of scope in the fix as well as in other security_locked_down() hook locations /outside/ of BPF subsystem is that if the lockdown policy changes on the fly there is no retrospective action. This requires a different discussion, potentially complex infrastructure, and it's also not clear whether this can be solved generically. Either way, it is out of scope for a suitable stable fix which this one is targeting. Note that the breakage is specifically on59438b4647
where it started to rely on 'current' as UAPI behavior, and _not_ earlier infrastructure such as9d1f8be5cf
("bpf: Restrict bpf when kernel lockdown is in confidentiality mode"). [0] https://bugzilla.redhat.com/show_bug.cgi?id=1955585, Jakub Hrozek says: I starting seeing this with F-34. When I run a container that is traced with BPF to record the syscalls it is doing, auditd is flooded with messages like: type=AVC msg=audit(1619784520.593:282387): avc: denied { confidentiality } for pid=476 comm="auditd" lockdown_reason="use of bpf to read kernel RAM" scontext=system_u:system_r:auditd_t:s0 tcontext=system_u:system_r:auditd_t:s0 tclass=lockdown permissive=0 This seems to be leading to auditd running out of space in the backlog buffer and eventually OOMs the machine. [...] auditd running at 99% CPU presumably processing all the messages, eventually I get: Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: backlog limit exceeded Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152579 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152626 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_backlog=2152694 > audit_backlog_limit=64 Apr 30 12:20:42 fedora kernel: audit: audit_lost=6878426 audit_rate_limit=0 audit_backlog_limit=64 Apr 30 12:20:45 fedora kernel: oci-seccomp-bpf invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000 Apr 30 12:20:45 fedora kernel: CPU: 0 PID: 13284 Comm: oci-seccomp-bpf Not tainted 5.11.12-300.fc34.x86_64 #1 Apr 30 12:20:45 fedora kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.fc32 04/01/2014 [...] [1] https://lore.kernel.org/linux-audit/CANYvDQN7H5tVp47fbYcRasv4XF07eUbsDwT_eDCHXJUj43J7jQ@mail.gmail.com/, Serhei Makarov says: Upstream kernel 5.11.0-rc7 and later was found to deadlock during a bpf_probe_read_compat() call within a sched_switch tracepoint. The problem is reproducible with the reg_alloc3 testcase from SystemTap's BPF backend testsuite on x86_64 as well as the runqlat, runqslower tools from bcc on ppc64le. Example stack trace: [...] [ 730.868702] stack backtrace: [ 730.869590] CPU: 1 PID: 701 Comm: in:imjournal Not tainted, 5.12.0-0.rc2.20210309git144c79ef3353.166.fc35.x86_64 #1 [ 730.871605] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 [ 730.873278] Call Trace: [ 730.873770] dump_stack+0x7f/0xa1 [ 730.874433] check_noncircular+0xdf/0x100 [ 730.875232] __lock_acquire+0x1202/0x1e10 [ 730.876031] ? __lock_acquire+0xfc0/0x1e10 [ 730.876844] lock_acquire+0xc2/0x3a0 [ 730.877551] ? __wake_up_common_lock+0x52/0x90 [ 730.878434] ? lock_acquire+0xc2/0x3a0 [ 730.879186] ? lock_is_held_type+0xa7/0x120 [ 730.880044] ? skb_queue_tail+0x1b/0x50 [ 730.880800] _raw_spin_lock_irqsave+0x4d/0x90 [ 730.881656] ? __wake_up_common_lock+0x52/0x90 [ 730.882532] __wake_up_common_lock+0x52/0x90 [ 730.883375] audit_log_end+0x5b/0x100 [ 730.884104] slow_avc_audit+0x69/0x90 [ 730.884836] avc_has_perm+0x8b/0xb0 [ 730.885532] selinux_lockdown+0xa5/0xd0 [ 730.886297] security_locked_down+0x20/0x40 [ 730.887133] bpf_probe_read_compat+0x66/0xd0 [ 730.887983] bpf_prog_250599c5469ac7b5+0x10f/0x820 [ 730.888917] trace_call_bpf+0xe9/0x240 [ 730.889672] perf_trace_run_bpf_submit+0x4d/0xc0 [ 730.890579] perf_trace_sched_switch+0x142/0x180 [ 730.891485] ? __schedule+0x6d8/0xb20 [ 730.892209] __schedule+0x6d8/0xb20 [ 730.892899] schedule+0x5b/0xc0 [ 730.893522] exit_to_user_mode_prepare+0x11d/0x240 [ 730.894457] syscall_exit_to_user_mode+0x27/0x70 [ 730.895361] entry_SYSCALL_64_after_hwframe+0x44/0xae [...] Fixes:59438b4647
("security,lockdown,selinux: implement SELinux lockdown") Reported-by: Ondrej Mosnacek <omosnace@redhat.com> Reported-by: Jakub Hrozek <jhrozek@redhat.com> Reported-by: Serhei Makarov <smakarov@redhat.com> Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: James Morris <jamorris@linux.microsoft.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Frank Eigler <fche@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/bpf/01135120-8bf7-df2e-cff0-1d73f1f841c3@iogearbox.net Signed-off-by: Sasha Levin <sashal@kernel.org>
745 lines
17 KiB
C
745 lines
17 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
|
|
*/
|
|
#include <linux/bpf.h>
|
|
#include <linux/rcupdate.h>
|
|
#include <linux/random.h>
|
|
#include <linux/smp.h>
|
|
#include <linux/topology.h>
|
|
#include <linux/ktime.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/uidgid.h>
|
|
#include <linux/filter.h>
|
|
#include <linux/ctype.h>
|
|
#include <linux/jiffies.h>
|
|
#include <linux/pid_namespace.h>
|
|
#include <linux/proc_ns.h>
|
|
#include <linux/security.h>
|
|
|
|
#include "../../lib/kstrtox.h"
|
|
|
|
/* If kernel subsystem is allowing eBPF programs to call this function,
|
|
* inside its own verifier_ops->get_func_proto() callback it should return
|
|
* bpf_map_lookup_elem_proto, so that verifier can properly check the arguments
|
|
*
|
|
* Different map implementations will rely on rcu in map methods
|
|
* lookup/update/delete, therefore eBPF programs must run under rcu lock
|
|
* if program is allowed to access maps, so check rcu_read_lock_held in
|
|
* all three functions.
|
|
*/
|
|
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
|
|
{
|
|
WARN_ON_ONCE(!rcu_read_lock_held());
|
|
return (unsigned long) map->ops->map_lookup_elem(map, key);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_lookup_elem_proto = {
|
|
.func = bpf_map_lookup_elem,
|
|
.gpl_only = false,
|
|
.pkt_access = true,
|
|
.ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_MAP_KEY,
|
|
};
|
|
|
|
BPF_CALL_4(bpf_map_update_elem, struct bpf_map *, map, void *, key,
|
|
void *, value, u64, flags)
|
|
{
|
|
WARN_ON_ONCE(!rcu_read_lock_held());
|
|
return map->ops->map_update_elem(map, key, value, flags);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_update_elem_proto = {
|
|
.func = bpf_map_update_elem,
|
|
.gpl_only = false,
|
|
.pkt_access = true,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_MAP_KEY,
|
|
.arg3_type = ARG_PTR_TO_MAP_VALUE,
|
|
.arg4_type = ARG_ANYTHING,
|
|
};
|
|
|
|
BPF_CALL_2(bpf_map_delete_elem, struct bpf_map *, map, void *, key)
|
|
{
|
|
WARN_ON_ONCE(!rcu_read_lock_held());
|
|
return map->ops->map_delete_elem(map, key);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_delete_elem_proto = {
|
|
.func = bpf_map_delete_elem,
|
|
.gpl_only = false,
|
|
.pkt_access = true,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_MAP_KEY,
|
|
};
|
|
|
|
BPF_CALL_3(bpf_map_push_elem, struct bpf_map *, map, void *, value, u64, flags)
|
|
{
|
|
return map->ops->map_push_elem(map, value, flags);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_push_elem_proto = {
|
|
.func = bpf_map_push_elem,
|
|
.gpl_only = false,
|
|
.pkt_access = true,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_MAP_VALUE,
|
|
.arg3_type = ARG_ANYTHING,
|
|
};
|
|
|
|
BPF_CALL_2(bpf_map_pop_elem, struct bpf_map *, map, void *, value)
|
|
{
|
|
return map->ops->map_pop_elem(map, value);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_pop_elem_proto = {
|
|
.func = bpf_map_pop_elem,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_UNINIT_MAP_VALUE,
|
|
};
|
|
|
|
BPF_CALL_2(bpf_map_peek_elem, struct bpf_map *, map, void *, value)
|
|
{
|
|
return map->ops->map_peek_elem(map, value);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_map_peek_elem_proto = {
|
|
.func = bpf_map_peek_elem,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_PTR_TO_UNINIT_MAP_VALUE,
|
|
};
|
|
|
|
const struct bpf_func_proto bpf_get_prandom_u32_proto = {
|
|
.func = bpf_user_rnd_u32,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_get_smp_processor_id)
|
|
{
|
|
return smp_processor_id();
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
|
|
.func = bpf_get_smp_processor_id,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_get_numa_node_id)
|
|
{
|
|
return numa_node_id();
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_numa_node_id_proto = {
|
|
.func = bpf_get_numa_node_id,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_ktime_get_ns)
|
|
{
|
|
/* NMI safe access to clock monotonic */
|
|
return ktime_get_mono_fast_ns();
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_ktime_get_ns_proto = {
|
|
.func = bpf_ktime_get_ns,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_ktime_get_boot_ns)
|
|
{
|
|
/* NMI safe access to clock boottime */
|
|
return ktime_get_boot_fast_ns();
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_ktime_get_boot_ns_proto = {
|
|
.func = bpf_ktime_get_boot_ns,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_get_current_pid_tgid)
|
|
{
|
|
struct task_struct *task = current;
|
|
|
|
if (unlikely(!task))
|
|
return -EINVAL;
|
|
|
|
return (u64) task->tgid << 32 | task->pid;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_current_pid_tgid_proto = {
|
|
.func = bpf_get_current_pid_tgid,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_0(bpf_get_current_uid_gid)
|
|
{
|
|
struct task_struct *task = current;
|
|
kuid_t uid;
|
|
kgid_t gid;
|
|
|
|
if (unlikely(!task))
|
|
return -EINVAL;
|
|
|
|
current_uid_gid(&uid, &gid);
|
|
return (u64) from_kgid(&init_user_ns, gid) << 32 |
|
|
from_kuid(&init_user_ns, uid);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_current_uid_gid_proto = {
|
|
.func = bpf_get_current_uid_gid,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_2(bpf_get_current_comm, char *, buf, u32, size)
|
|
{
|
|
struct task_struct *task = current;
|
|
|
|
if (unlikely(!task))
|
|
goto err_clear;
|
|
|
|
strncpy(buf, task->comm, size);
|
|
|
|
/* Verifier guarantees that size > 0. For task->comm exceeding
|
|
* size, guarantee that buf is %NUL-terminated. Unconditionally
|
|
* done here to save the size test.
|
|
*/
|
|
buf[size - 1] = 0;
|
|
return 0;
|
|
err_clear:
|
|
memset(buf, 0, size);
|
|
return -EINVAL;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_current_comm_proto = {
|
|
.func = bpf_get_current_comm,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
|
|
.arg2_type = ARG_CONST_SIZE,
|
|
};
|
|
|
|
#if defined(CONFIG_QUEUED_SPINLOCKS) || defined(CONFIG_BPF_ARCH_SPINLOCK)
|
|
|
|
static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
|
|
{
|
|
arch_spinlock_t *l = (void *)lock;
|
|
union {
|
|
__u32 val;
|
|
arch_spinlock_t lock;
|
|
} u = { .lock = __ARCH_SPIN_LOCK_UNLOCKED };
|
|
|
|
compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
|
|
BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
|
|
BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
|
|
arch_spin_lock(l);
|
|
}
|
|
|
|
static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
|
|
{
|
|
arch_spinlock_t *l = (void *)lock;
|
|
|
|
arch_spin_unlock(l);
|
|
}
|
|
|
|
#else
|
|
|
|
static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
|
|
{
|
|
atomic_t *l = (void *)lock;
|
|
|
|
BUILD_BUG_ON(sizeof(*l) != sizeof(*lock));
|
|
do {
|
|
atomic_cond_read_relaxed(l, !VAL);
|
|
} while (atomic_xchg(l, 1));
|
|
}
|
|
|
|
static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
|
|
{
|
|
atomic_t *l = (void *)lock;
|
|
|
|
atomic_set_release(l, 0);
|
|
}
|
|
|
|
#endif
|
|
|
|
static DEFINE_PER_CPU(unsigned long, irqsave_flags);
|
|
|
|
notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
|
|
{
|
|
unsigned long flags;
|
|
|
|
local_irq_save(flags);
|
|
__bpf_spin_lock(lock);
|
|
__this_cpu_write(irqsave_flags, flags);
|
|
return 0;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_spin_lock_proto = {
|
|
.func = bpf_spin_lock,
|
|
.gpl_only = false,
|
|
.ret_type = RET_VOID,
|
|
.arg1_type = ARG_PTR_TO_SPIN_LOCK,
|
|
};
|
|
|
|
notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock *, lock)
|
|
{
|
|
unsigned long flags;
|
|
|
|
flags = __this_cpu_read(irqsave_flags);
|
|
__bpf_spin_unlock(lock);
|
|
local_irq_restore(flags);
|
|
return 0;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_spin_unlock_proto = {
|
|
.func = bpf_spin_unlock,
|
|
.gpl_only = false,
|
|
.ret_type = RET_VOID,
|
|
.arg1_type = ARG_PTR_TO_SPIN_LOCK,
|
|
};
|
|
|
|
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
|
|
bool lock_src)
|
|
{
|
|
struct bpf_spin_lock *lock;
|
|
|
|
if (lock_src)
|
|
lock = src + map->spin_lock_off;
|
|
else
|
|
lock = dst + map->spin_lock_off;
|
|
preempt_disable();
|
|
____bpf_spin_lock(lock);
|
|
copy_map_value(map, dst, src);
|
|
____bpf_spin_unlock(lock);
|
|
preempt_enable();
|
|
}
|
|
|
|
BPF_CALL_0(bpf_jiffies64)
|
|
{
|
|
return get_jiffies_64();
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_jiffies64_proto = {
|
|
.func = bpf_jiffies64,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
#ifdef CONFIG_CGROUPS
|
|
BPF_CALL_0(bpf_get_current_cgroup_id)
|
|
{
|
|
struct cgroup *cgrp = task_dfl_cgroup(current);
|
|
|
|
return cgroup_id(cgrp);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_current_cgroup_id_proto = {
|
|
.func = bpf_get_current_cgroup_id,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_1(bpf_get_current_ancestor_cgroup_id, int, ancestor_level)
|
|
{
|
|
struct cgroup *cgrp = task_dfl_cgroup(current);
|
|
struct cgroup *ancestor;
|
|
|
|
ancestor = cgroup_ancestor(cgrp, ancestor_level);
|
|
if (!ancestor)
|
|
return 0;
|
|
return cgroup_id(ancestor);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = {
|
|
.func = bpf_get_current_ancestor_cgroup_id,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_ANYTHING,
|
|
};
|
|
|
|
#ifdef CONFIG_CGROUP_BPF
|
|
DECLARE_PER_CPU(struct bpf_cgroup_storage*,
|
|
bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
|
|
|
|
BPF_CALL_2(bpf_get_local_storage, struct bpf_map *, map, u64, flags)
|
|
{
|
|
/* flags argument is not used now,
|
|
* but provides an ability to extend the API.
|
|
* verifier checks that its value is correct.
|
|
*/
|
|
enum bpf_cgroup_storage_type stype = cgroup_storage_type(map);
|
|
struct bpf_cgroup_storage *storage;
|
|
void *ptr;
|
|
|
|
storage = this_cpu_read(bpf_cgroup_storage[stype]);
|
|
|
|
if (stype == BPF_CGROUP_STORAGE_SHARED)
|
|
ptr = &READ_ONCE(storage->buf)->data[0];
|
|
else
|
|
ptr = this_cpu_ptr(storage->percpu_buf);
|
|
|
|
return (unsigned long)ptr;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_local_storage_proto = {
|
|
.func = bpf_get_local_storage,
|
|
.gpl_only = false,
|
|
.ret_type = RET_PTR_TO_MAP_VALUE,
|
|
.arg1_type = ARG_CONST_MAP_PTR,
|
|
.arg2_type = ARG_ANYTHING,
|
|
};
|
|
#endif
|
|
|
|
#define BPF_STRTOX_BASE_MASK 0x1F
|
|
|
|
static int __bpf_strtoull(const char *buf, size_t buf_len, u64 flags,
|
|
unsigned long long *res, bool *is_negative)
|
|
{
|
|
unsigned int base = flags & BPF_STRTOX_BASE_MASK;
|
|
const char *cur_buf = buf;
|
|
size_t cur_len = buf_len;
|
|
unsigned int consumed;
|
|
size_t val_len;
|
|
char str[64];
|
|
|
|
if (!buf || !buf_len || !res || !is_negative)
|
|
return -EINVAL;
|
|
|
|
if (base != 0 && base != 8 && base != 10 && base != 16)
|
|
return -EINVAL;
|
|
|
|
if (flags & ~BPF_STRTOX_BASE_MASK)
|
|
return -EINVAL;
|
|
|
|
while (cur_buf < buf + buf_len && isspace(*cur_buf))
|
|
++cur_buf;
|
|
|
|
*is_negative = (cur_buf < buf + buf_len && *cur_buf == '-');
|
|
if (*is_negative)
|
|
++cur_buf;
|
|
|
|
consumed = cur_buf - buf;
|
|
cur_len -= consumed;
|
|
if (!cur_len)
|
|
return -EINVAL;
|
|
|
|
cur_len = min(cur_len, sizeof(str) - 1);
|
|
memcpy(str, cur_buf, cur_len);
|
|
str[cur_len] = '\0';
|
|
cur_buf = str;
|
|
|
|
cur_buf = _parse_integer_fixup_radix(cur_buf, &base);
|
|
val_len = _parse_integer(cur_buf, base, res);
|
|
|
|
if (val_len & KSTRTOX_OVERFLOW)
|
|
return -ERANGE;
|
|
|
|
if (val_len == 0)
|
|
return -EINVAL;
|
|
|
|
cur_buf += val_len;
|
|
consumed += cur_buf - str;
|
|
|
|
return consumed;
|
|
}
|
|
|
|
static int __bpf_strtoll(const char *buf, size_t buf_len, u64 flags,
|
|
long long *res)
|
|
{
|
|
unsigned long long _res;
|
|
bool is_negative;
|
|
int err;
|
|
|
|
err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
|
|
if (err < 0)
|
|
return err;
|
|
if (is_negative) {
|
|
if ((long long)-_res > 0)
|
|
return -ERANGE;
|
|
*res = -_res;
|
|
} else {
|
|
if ((long long)_res < 0)
|
|
return -ERANGE;
|
|
*res = _res;
|
|
}
|
|
return err;
|
|
}
|
|
|
|
BPF_CALL_4(bpf_strtol, const char *, buf, size_t, buf_len, u64, flags,
|
|
long *, res)
|
|
{
|
|
long long _res;
|
|
int err;
|
|
|
|
err = __bpf_strtoll(buf, buf_len, flags, &_res);
|
|
if (err < 0)
|
|
return err;
|
|
if (_res != (long)_res)
|
|
return -ERANGE;
|
|
*res = _res;
|
|
return err;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_strtol_proto = {
|
|
.func = bpf_strtol,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_PTR_TO_MEM,
|
|
.arg2_type = ARG_CONST_SIZE,
|
|
.arg3_type = ARG_ANYTHING,
|
|
.arg4_type = ARG_PTR_TO_LONG,
|
|
};
|
|
|
|
BPF_CALL_4(bpf_strtoul, const char *, buf, size_t, buf_len, u64, flags,
|
|
unsigned long *, res)
|
|
{
|
|
unsigned long long _res;
|
|
bool is_negative;
|
|
int err;
|
|
|
|
err = __bpf_strtoull(buf, buf_len, flags, &_res, &is_negative);
|
|
if (err < 0)
|
|
return err;
|
|
if (is_negative)
|
|
return -EINVAL;
|
|
if (_res != (unsigned long)_res)
|
|
return -ERANGE;
|
|
*res = _res;
|
|
return err;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_strtoul_proto = {
|
|
.func = bpf_strtoul,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_PTR_TO_MEM,
|
|
.arg2_type = ARG_CONST_SIZE,
|
|
.arg3_type = ARG_ANYTHING,
|
|
.arg4_type = ARG_PTR_TO_LONG,
|
|
};
|
|
#endif
|
|
|
|
BPF_CALL_4(bpf_get_ns_current_pid_tgid, u64, dev, u64, ino,
|
|
struct bpf_pidns_info *, nsdata, u32, size)
|
|
{
|
|
struct task_struct *task = current;
|
|
struct pid_namespace *pidns;
|
|
int err = -EINVAL;
|
|
|
|
if (unlikely(size != sizeof(struct bpf_pidns_info)))
|
|
goto clear;
|
|
|
|
if (unlikely((u64)(dev_t)dev != dev))
|
|
goto clear;
|
|
|
|
if (unlikely(!task))
|
|
goto clear;
|
|
|
|
pidns = task_active_pid_ns(task);
|
|
if (unlikely(!pidns)) {
|
|
err = -ENOENT;
|
|
goto clear;
|
|
}
|
|
|
|
if (!ns_match(&pidns->ns, (dev_t)dev, ino))
|
|
goto clear;
|
|
|
|
nsdata->pid = task_pid_nr_ns(task, pidns);
|
|
nsdata->tgid = task_tgid_nr_ns(task, pidns);
|
|
return 0;
|
|
clear:
|
|
memset((void *)nsdata, 0, (size_t) size);
|
|
return err;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_get_ns_current_pid_tgid_proto = {
|
|
.func = bpf_get_ns_current_pid_tgid,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_ANYTHING,
|
|
.arg2_type = ARG_ANYTHING,
|
|
.arg3_type = ARG_PTR_TO_UNINIT_MEM,
|
|
.arg4_type = ARG_CONST_SIZE,
|
|
};
|
|
|
|
static const struct bpf_func_proto bpf_get_raw_smp_processor_id_proto = {
|
|
.func = bpf_get_raw_cpu_id,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
};
|
|
|
|
BPF_CALL_5(bpf_event_output_data, void *, ctx, struct bpf_map *, map,
|
|
u64, flags, void *, data, u64, size)
|
|
{
|
|
if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
|
|
return -EINVAL;
|
|
|
|
return bpf_event_output(map, flags, data, size, NULL, 0, NULL);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_event_output_data_proto = {
|
|
.func = bpf_event_output_data,
|
|
.gpl_only = true,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_PTR_TO_CTX,
|
|
.arg2_type = ARG_CONST_MAP_PTR,
|
|
.arg3_type = ARG_ANYTHING,
|
|
.arg4_type = ARG_PTR_TO_MEM,
|
|
.arg5_type = ARG_CONST_SIZE_OR_ZERO,
|
|
};
|
|
|
|
BPF_CALL_3(bpf_copy_from_user, void *, dst, u32, size,
|
|
const void __user *, user_ptr)
|
|
{
|
|
int ret = copy_from_user(dst, user_ptr, size);
|
|
|
|
if (unlikely(ret)) {
|
|
memset(dst, 0, size);
|
|
ret = -EFAULT;
|
|
}
|
|
|
|
return ret;
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_copy_from_user_proto = {
|
|
.func = bpf_copy_from_user,
|
|
.gpl_only = false,
|
|
.ret_type = RET_INTEGER,
|
|
.arg1_type = ARG_PTR_TO_UNINIT_MEM,
|
|
.arg2_type = ARG_CONST_SIZE_OR_ZERO,
|
|
.arg3_type = ARG_ANYTHING,
|
|
};
|
|
|
|
BPF_CALL_2(bpf_per_cpu_ptr, const void *, ptr, u32, cpu)
|
|
{
|
|
if (cpu >= nr_cpu_ids)
|
|
return (unsigned long)NULL;
|
|
|
|
return (unsigned long)per_cpu_ptr((const void __percpu *)ptr, cpu);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_per_cpu_ptr_proto = {
|
|
.func = bpf_per_cpu_ptr,
|
|
.gpl_only = false,
|
|
.ret_type = RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL,
|
|
.arg1_type = ARG_PTR_TO_PERCPU_BTF_ID,
|
|
.arg2_type = ARG_ANYTHING,
|
|
};
|
|
|
|
BPF_CALL_1(bpf_this_cpu_ptr, const void *, percpu_ptr)
|
|
{
|
|
return (unsigned long)this_cpu_ptr((const void __percpu *)percpu_ptr);
|
|
}
|
|
|
|
const struct bpf_func_proto bpf_this_cpu_ptr_proto = {
|
|
.func = bpf_this_cpu_ptr,
|
|
.gpl_only = false,
|
|
.ret_type = RET_PTR_TO_MEM_OR_BTF_ID,
|
|
.arg1_type = ARG_PTR_TO_PERCPU_BTF_ID,
|
|
};
|
|
|
|
const struct bpf_func_proto bpf_get_current_task_proto __weak;
|
|
const struct bpf_func_proto bpf_probe_read_user_proto __weak;
|
|
const struct bpf_func_proto bpf_probe_read_user_str_proto __weak;
|
|
const struct bpf_func_proto bpf_probe_read_kernel_proto __weak;
|
|
const struct bpf_func_proto bpf_probe_read_kernel_str_proto __weak;
|
|
|
|
const struct bpf_func_proto *
|
|
bpf_base_func_proto(enum bpf_func_id func_id)
|
|
{
|
|
switch (func_id) {
|
|
case BPF_FUNC_map_lookup_elem:
|
|
return &bpf_map_lookup_elem_proto;
|
|
case BPF_FUNC_map_update_elem:
|
|
return &bpf_map_update_elem_proto;
|
|
case BPF_FUNC_map_delete_elem:
|
|
return &bpf_map_delete_elem_proto;
|
|
case BPF_FUNC_map_push_elem:
|
|
return &bpf_map_push_elem_proto;
|
|
case BPF_FUNC_map_pop_elem:
|
|
return &bpf_map_pop_elem_proto;
|
|
case BPF_FUNC_map_peek_elem:
|
|
return &bpf_map_peek_elem_proto;
|
|
case BPF_FUNC_get_prandom_u32:
|
|
return &bpf_get_prandom_u32_proto;
|
|
case BPF_FUNC_get_smp_processor_id:
|
|
return &bpf_get_raw_smp_processor_id_proto;
|
|
case BPF_FUNC_get_numa_node_id:
|
|
return &bpf_get_numa_node_id_proto;
|
|
case BPF_FUNC_tail_call:
|
|
return &bpf_tail_call_proto;
|
|
case BPF_FUNC_ktime_get_ns:
|
|
return &bpf_ktime_get_ns_proto;
|
|
case BPF_FUNC_ktime_get_boot_ns:
|
|
return &bpf_ktime_get_boot_ns_proto;
|
|
case BPF_FUNC_ringbuf_output:
|
|
return &bpf_ringbuf_output_proto;
|
|
case BPF_FUNC_ringbuf_reserve:
|
|
return &bpf_ringbuf_reserve_proto;
|
|
case BPF_FUNC_ringbuf_submit:
|
|
return &bpf_ringbuf_submit_proto;
|
|
case BPF_FUNC_ringbuf_discard:
|
|
return &bpf_ringbuf_discard_proto;
|
|
case BPF_FUNC_ringbuf_query:
|
|
return &bpf_ringbuf_query_proto;
|
|
default:
|
|
break;
|
|
}
|
|
|
|
if (!bpf_capable())
|
|
return NULL;
|
|
|
|
switch (func_id) {
|
|
case BPF_FUNC_spin_lock:
|
|
return &bpf_spin_lock_proto;
|
|
case BPF_FUNC_spin_unlock:
|
|
return &bpf_spin_unlock_proto;
|
|
case BPF_FUNC_jiffies64:
|
|
return &bpf_jiffies64_proto;
|
|
case BPF_FUNC_per_cpu_ptr:
|
|
return &bpf_per_cpu_ptr_proto;
|
|
case BPF_FUNC_this_cpu_ptr:
|
|
return &bpf_this_cpu_ptr_proto;
|
|
default:
|
|
break;
|
|
}
|
|
|
|
if (!perfmon_capable())
|
|
return NULL;
|
|
|
|
switch (func_id) {
|
|
case BPF_FUNC_trace_printk:
|
|
return bpf_get_trace_printk_proto();
|
|
case BPF_FUNC_get_current_task:
|
|
return &bpf_get_current_task_proto;
|
|
case BPF_FUNC_probe_read_user:
|
|
return &bpf_probe_read_user_proto;
|
|
case BPF_FUNC_probe_read_kernel:
|
|
return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
|
|
NULL : &bpf_probe_read_kernel_proto;
|
|
case BPF_FUNC_probe_read_user_str:
|
|
return &bpf_probe_read_user_str_proto;
|
|
case BPF_FUNC_probe_read_kernel_str:
|
|
return security_locked_down(LOCKDOWN_BPF_READ) < 0 ?
|
|
NULL : &bpf_probe_read_kernel_str_proto;
|
|
case BPF_FUNC_snprintf_btf:
|
|
return &bpf_snprintf_btf_proto;
|
|
default:
|
|
return NULL;
|
|
}
|
|
}
|