kernel_optimize_test/kernel
Waiman Long 7e849dbe60 cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()
commit 2685027fca387b602ae565bff17895188b803988 upstream.

There are 3 places where the cpu and node masks of the top cpuset can
be initialized in the order they are executed:
 1) start_kernel -> cpuset_init()
 2) start_kernel -> cgroup_init() -> cpuset_bind()
 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp()

The first cpuset_init() call just sets all the bits in the masks.
The second cpuset_bind() call sets cpus_allowed and mems_allowed to the
default v2 values. The third cpuset_init_smp() call sets them back to
v1 values.

For systems with cgroup v2 setup, cpuset_bind() is called once.  As a
result, cpu and memory node hot add may fail to update the cpu and node
masks of the top cpuset to include the newly added cpu or node in a
cgroup v2 environment.

For systems with cgroup v1 setup, cpuset_bind() is called again by
rebind_subsystem() when the v1 cpuset filesystem is mounted as shown
in the dmesg log below with an instrumented kernel.

  [    2.609781] cpuset_bind() called - v2 = 1
  [    3.079473] cpuset_init_smp() called
  [    7.103710] cpuset_bind() called - v2 = 0

smp_init() is called after the first two init functions.  So we don't
have a complete list of active cpus and memory nodes until later in
cpuset_init_smp() which is the right time to set up effective_cpus
and effective_mems.

To fix this cgroup v2 mask setup problem, the potentially incorrect
cpus_allowed & mems_allowed setting in cpuset_init_smp() are removed.
For cgroup v2 systems, the initial cpuset_bind() call will set the masks
correctly.  For cgroup v1 systems, the second call to cpuset_bind()
will do the right setup.

cc: stable@vger.kernel.org
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-18 10:23:47 +02:00
..
bpf bpf: Adjust BPF stack helper functions to accommodate skip > 0 2022-04-08 14:40:43 +02:00
cgroup cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() 2022-05-18 10:23:47 +02:00
configs
debug kdb: Fix the putarea helper function 2022-04-08 14:40:29 +02:00
dma dma-direct: avoid redundant memory sync for swiotlb 2022-04-20 09:23:30 +02:00
entry KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest 2021-10-06 15:55:49 +02:00
events perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled 2022-04-27 13:53:56 +02:00
gcov
irq genirq: Synchronize interrupt thread startup 2022-05-12 12:25:32 +02:00
kcsan
livepatch livepatch: Fix build failure on 32 bits processors 2022-04-08 14:40:15 +02:00
locking locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-04-08 14:40:32 +02:00
power PM: suspend: fix return value of __setup handler 2022-04-08 14:40:01 +02:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-08 14:40:08 +02:00
rcu rcu: Apply callbacks processing time limit only on softirq 2022-05-12 12:25:45 +02:00
sched sched/pelt: Fix attach_entity_load_avg() corner case 2022-04-27 13:53:56 +02:00
time timers: Fix warning condition in __run_timers() 2022-04-20 09:23:30 +02:00
trace tracing: Dump stacktrace trigger to the corresponding instance 2022-04-27 13:53:46 +02:00
.gitignore
acct.c
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:01:00 +01:00
audit_fsnotify.c
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-09-03 10:09:31 +02:00
audit_watch.c
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:30:34 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
auditfilter.c
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:40:00 +02:00
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-09-15 09:50:40 +02:00
cpu.c sched/scs: Reset task stack state in bringup_cpu() 2021-12-01 09:19:08 +01:00
crash_core.c
crash_dump.c
cred.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
delayacct.c
dma.c
exec_domain.c
exit.c
extable.c
fail_function.c
fork.c copy_process(): Move fd_install() out of sighand->siglock critical section 2022-02-23 12:01:08 +01:00
freezer.c
futex.c
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c
kexec_elf.c
kexec_file.c
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c kprobes: Limit max data_size of the kretprobe instances 2021-12-08 09:03:20 +01:00
ksysfs.c
kthread.c kthread: Fix PF_KTHREAD vs to_kthread() race 2021-09-03 10:09:31 +02:00
latencytop.c
Makefile
module_signature.c
module_signing.c
module-internal.h
module.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:01:00 +01:00
notifier.c
nsproxy.c
padata.c
panic.c
params.c
pid_namespace.c memcg: enable accounting for pids in nested pid namespaces 2021-09-18 13:40:36 +02:00
pid.c
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-26 14:08:58 +02:00
ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE 2022-04-08 14:39:50 +02:00
range.c
reboot.c
regset.c
relay.c
resource.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:40:03 +02:00
scftorture.c
scs.c
seccomp.c seccomp: Fix setting loaded filter count during TSYNC 2021-08-18 08:59:06 +02:00
signal.c signal: Remove the bogus sigkill_pending in ptrace_stop 2021-11-18 14:03:47 +01:00
smp.c smp: Fix offline cpu check in flush_smp_call_function_queue() 2022-04-20 09:23:29 +02:00
smpboot.c
smpboot.h
softirq.c
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:01:00 +01:00
stacktrace.c
static_call.c static_call: Fix unused variable warn w/o MODULE 2021-09-08 08:49:00 +02:00
stop_machine.c
sys_ni.c
sys.c prctl: allow to setup brk for et_dyn executables 2021-09-26 14:08:57 +02:00
sysctl-test.c
sysctl.c x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting 2022-03-11 12:11:49 +01:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c tracepoint: Use rcu get state and cond sync for static call updates 2021-09-03 10:09:30 +02:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 10:54:33 +01:00
ucount.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
uid16.c
uid16.h
umh.c
up.c
user_namespace.c Revert "Add a reference to ucounts for each cred" 2021-09-08 08:49:00 +02:00
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: Free the page array when watch_queue is dismantled 2022-04-08 14:40:41 +02:00
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c workqueue: Fix unbind_workers() VS wq_worker_running() race 2022-01-16 09:14:22 +01:00