kernel_optimize_test/kernel
Ingo Molnar 1251201c0d sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code
Thadeu Lima de Souza Cascardo reported that 'chrt' broke on recent kernels:

  $ chrt -p $$
  chrt: failed to get pid 26306's policy: Argument list too long

and he has root-caused the bug to the following commit increasing sched_attr
size and breaking sched_read_attr() into returning -EFBIG:

  a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")

The other, bigger bug is that the whole sched_getattr() and sched_read_attr()
logic of checking non-zero bits in new ABI components is arguably broken,
and pretty much any extension of the ABI will spuriously break the ABI.
That's way too fragile.

Instead implement the perf syscall's extensible ABI instead, which we
already implement on the sched_setattr() side:

 - if user-attributes have the same size as kernel attributes then the
   logic is unchanged.

 - if user-attributes are larger than the kernel knows about then simply
   skip the extra bits, but set attr->size to the (smaller) kernel size
   so that tooling can (in principle) handle older kernel as well.

 - if user-attributes are smaller than the kernel knows about then just
   copy whatever user-space can accept.

Also clean up the whole logic:

 - Simplify the code flow - there's no need for 'ret' for example.

 - Standardize on 'kattr/uattr' and 'ksize/usize' naming to make sure we
   always know which side we are dealing with.

 - Why is it called 'read' when what it does is to copy to user? This
   code is so far away from VFS read() semantics that the naming is
   actively confusing. Name it sched_attr_copy_to_user() instead, which
   mirrors other copy_to_user() functionality.

 - Move the attr->size assignment from the head of sched_getattr() to the
   sched_attr_copy_to_user() function. Nothing else within the kernel
   should care about the size of the structure.

With these fixes the sched_getattr() syscall now nicely supports an
extensible ABI in both a forward and backward compatible fashion, and
will also fix the chrt bug.

As an added bonus the bogus -EFBIG return is removed as well, which as
Thadeu noted should have been -E2BIG to begin with.

Reported-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: a509a7cd79 ("sched/uclamp: Extend sched_setattr() to support utilization clamping")
Link: https://lkml.kernel.org/r/20190904075532.GA26751@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-04 19:51:30 +02:00
..
bpf bpf: handle 32-bit zext during constant blinding 2019-08-26 23:05:01 +02:00
cgroup Merge branch 'work.mount0' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-07-19 10:42:02 -07:00
configs
debug
dma dma-direct: fix zone selection after an unaddressable CMA allocation 2019-08-21 07:14:10 +09:00
events perf/core: Fix creating kernel counters for PMUs that override event->cpu 2019-07-25 15:41:31 +02:00
gcov
irq genirq: Properly pair kobject_del() with kobject_add() 2019-08-19 21:41:19 +02:00
livepatch Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching 2019-07-11 15:30:05 -07:00
locking locking/mutex: Test for initialized mutex 2019-07-25 15:39:27 +02:00
power pci-v5.3-changes 2019-07-15 20:44:49 -07:00
printk
rcu
sched sched/core: Fix uclamp ABI bug, clean up and robustify sched_read_attr() ABI logic and code 2019-09-04 19:51:30 +02:00
time timekeeping/vsyscall: Prevent math overflow in BOOTTIME update 2019-08-23 02:12:11 +02:00
trace tracing: Correct kdoc formats 2019-08-31 06:51:56 -04:00
.gitignore
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c kernel/configs: Replace GPL boilerplate code with SPDX identifier 2019-07-30 18:34:15 +02:00
context_tracking.c
cpu_pm.c
cpu.c
crash_core.c
crash_dump.c
cred.c Merge branch 'access-creds' 2019-07-25 08:36:29 -07:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c exit: make setting exit_state consistent 2019-07-30 19:57:14 +02:00
extable.c
fail_function.c
fork.c sched/fair: Don't free p->numa_faults with concurrent readers 2019-07-25 15:37:04 +02:00
freezer.c
futex.c
gen_kheaders.sh
groups.c
hung_task.c
iomem.c mm/nvdimm: add is_ioremap_addr and use that to check ioremap address 2019-07-12 11:05:40 -07:00
irq_work.c
jump_label.c
kallsyms.c kallsyms: Don't let kallsyms_lookup_size_offset() fail on retrieving the first symbol 2019-08-27 16:19:56 +01:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y 2019-07-22 18:05:11 +02:00
kcov.c
kexec_core.c
kexec_file.c Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity 2019-07-08 20:28:59 -07:00
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c kprobes: Fix potential deadlock in kprobe_optimizer() 2019-08-19 12:22:19 +02:00
ksysfs.c
kthread.c
latencytop.c
Makefile memremap: move from kernel/ to mm/ 2019-08-03 07:02:01 -07:00
module_signing.c
module-internal.h
module.c modules: page-align module section allocations only for arches supporting strict module rwx 2019-08-21 10:43:56 +02:00
notifier.c
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-07-18 13:39:54 +08:00
panic.c docs: admin-guide: move sysctl directory to it 2019-07-15 11:03:01 -03:00
params.c
pid_namespace.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
pid.c kernel/pid.c: convert struct pid count to refcount_t 2019-07-16 19:23:24 -07:00
profile.c
ptrace.c ptrace: add PTRACE_GET_SYSCALL_INFO request 2019-07-16 19:23:24 -07:00
range.c
reboot.c
relay.c
resource.c resource: avoid unnecessary lookups in find_next_iomem_res() 2019-07-18 17:08:06 -07:00
rseq.c
seccomp.c
signal.c signal: Allow cifs and drbd to receive their terminating signals 2019-08-19 06:34:13 -05:00
smp.c smp: Warn on function calls from softirq context 2019-07-20 11:27:16 +02:00
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c stacktrace: Force USER_DS for stack_trace_save_user() 2019-07-18 16:47:24 +02:00
stop_machine.c
sys_ni.c
sys.c
sysctl_binary.c
sysctl.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c The main changes in this release include: 2019-07-18 11:51:00 -07:00
tsacct.c
ucount.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
user-return-notifier.c
user.c Keyrings namespacing 2019-07-08 19:36:47 -07:00
utsname_sysctl.c
utsname.c
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c