kernel_optimize_test/kernel
Steven Rostedt (VMware) a33614d52e tracing: Restructure trace_clock_global() to never block
commit aafe104aa9096827a429bc1358f8260ee565b7cc upstream.

It was reported that a fix to the ring buffer recursion detection would
cause a hung machine when performing suspend / resume testing. The
following backtrace was extracted from debugging that case:

Call Trace:
 trace_clock_global+0x91/0xa0
 __rb_reserve_next+0x237/0x460
 ring_buffer_lock_reserve+0x12a/0x3f0
 trace_buffer_lock_reserve+0x10/0x50
 __trace_graph_return+0x1f/0x80
 trace_graph_return+0xb7/0xf0
 ? trace_clock_global+0x91/0xa0
 ftrace_return_to_handler+0x8b/0xf0
 ? pv_hash+0xa0/0xa0
 return_to_handler+0x15/0x30
 ? ftrace_graph_caller+0xa0/0xa0
 ? trace_clock_global+0x91/0xa0
 ? __rb_reserve_next+0x237/0x460
 ? ring_buffer_lock_reserve+0x12a/0x3f0
 ? trace_event_buffer_lock_reserve+0x3c/0x120
 ? trace_event_buffer_reserve+0x6b/0xc0
 ? trace_event_raw_event_device_pm_callback_start+0x125/0x2d0
 ? dpm_run_callback+0x3b/0xc0
 ? pm_ops_is_empty+0x50/0x50
 ? platform_get_irq_byname_optional+0x90/0x90
 ? trace_device_pm_callback_start+0x82/0xd0
 ? dpm_run_callback+0x49/0xc0

With the following RIP:

RIP: 0010:native_queued_spin_lock_slowpath+0x69/0x200

Since the fix to the recursion detection would allow a single recursion to
happen while tracing, this lead to the trace_clock_global() taking a spin
lock and then trying to take it again:

ring_buffer_lock_reserve() {
  trace_clock_global() {
    arch_spin_lock() {
      queued_spin_lock_slowpath() {
        /* lock taken */
        (something else gets traced by function graph tracer)
          ring_buffer_lock_reserve() {
            trace_clock_global() {
              arch_spin_lock() {
                queued_spin_lock_slowpath() {
                /* DEAD LOCK! */

Tracing should *never* block, as it can lead to strange lockups like the
above.

Restructure the trace_clock_global() code to instead of simply taking a
lock to update the recorded "prev_time" simply use it, as two events
happening on two different CPUs that calls this at the same time, really
doesn't matter which one goes first. Use a trylock to grab the lock for
updating the prev_time, and if it fails, simply try again the next time.
If it failed to be taken, that means something else is already updating
it.

Link: https://lkml.kernel.org/r/20210430121758.650b6e8a@gandalf.local.home

Cc: stable@vger.kernel.org
Tested-by: Konstantin Kharlamov <hi-angel@yandex.ru>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Fixes: b02414c8f0 ("ring-buffer: Fix recursion protection transitions between interrupt context") # started showing the problem
Fixes: 14131f2f98 ("tracing: implement trace_clock_*() APIs") # where the bug happened
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212761
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-05-11 14:47:40 +02:00
..
bpf
cgroup
configs
debug
dma swiotlb: respect min_align_mask 2021-05-07 11:04:32 +02:00
entry
events perf: Rework perf_event_exit_event() 2021-05-11 14:47:31 +02:00
gcov
irq genirq/matrix: Prevent allocation counter corruption 2021-05-11 14:47:17 +02:00
kcsan kcsan, debugfs: Move debugfs file creation out of early init 2021-05-11 14:47:33 +02:00
livepatch
locking
power
printk
rcu kvfree_rcu: Use same set of GFP flags as does single-argument 2021-05-11 14:47:23 +02:00
sched sched,psi: Handle potential task count underflow bugs more gracefully 2021-05-11 14:47:31 +02:00
time posix-timers: Preserve return value in clock_adjtime32() 2021-05-11 14:47:16 +02:00
trace tracing: Restructure trace_clock_global() to never block 2021-05-11 14:47:40 +02:00
.gitignore kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
acct.c
async.c
audit_fsnotify.c
audit_tree.c
audit_watch.c
audit.c
audit.h
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c
extable.c
fail_function.c
fork.c
freezer.c
futex.c futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI 2021-05-11 14:47:37 +02:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c
kexec_core.c
kexec_elf.c
kexec_file.c
kexec_internal.h
kexec.c
kheaders.c
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-11 14:47:37 +02:00
module_signature.c
module_signing.c
module-internal.h
module.c
notifier.c
nsproxy.c
padata.c
panic.c
params.c
pid_namespace.c
pid.c
profile.c
ptrace.c
range.c
reboot.c
regset.c
relay.c
resource.c
rseq.c
scftorture.c
scs.c
seccomp.c
signal.c
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c
stacktrace.c
static_call.c
stop_machine.c
sys_ni.c
sys.c
sysctl-test.c
sysctl.c
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user_namespace.c
user-return-notifier.c
user.c
usermode_driver.c
utsname_sysctl.c
utsname.c
watch_queue.c
watchdog_hld.c
watchdog.c
workqueue_internal.h
workqueue.c