Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/kcsan

Pull the KCSAN subsystem from Paul E. McKenney:

   "This pull request contains base kernel concurrency sanitizer
    (KCSAN) enablement for x86, courtesy of Marco Elver.  KCSAN is a
    sampling watchpoint-based data-race detector, and is documented in
    Documentation/dev-tools/kcsan.rst.  KCSAN was announced in September,
    and much feedback has since been incorporated:

      http://lkml.kernel.org/r/CANpmjNPJ_bHjfLZCAPV23AXFfiPiyXXqqu72n6TgWzb2Gnu1eA@mail.gmail.com

    The data races located thus far have resulted in a number of fixes:

      https://github.com/google/ktsan/wiki/KCSAN#upstream-fixes-of-data-races-found-by-kcsan

    Additional information may be found here:

      https://lore.kernel.org/lkml/20191114180303.66955-1-elver@google.com/
   "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
This commit is contained in:
Ingo Molnar 2019-11-19 19:54:39 +01:00
commit 8e1d58ae0c
45 changed files with 2623 additions and 206 deletions

View File

@ -21,6 +21,7 @@ whole; patches welcome!
kasan
ubsan
kmemleak
kcsan
gdb-kernel-debugging
kgdb
kselftest

View File

@ -0,0 +1,256 @@
The Kernel Concurrency Sanitizer (KCSAN)
========================================
Overview
--------
*Kernel Concurrency Sanitizer (KCSAN)* is a dynamic data race detector for
kernel space. KCSAN is a sampling watchpoint-based data race detector. Key
priorities in KCSAN's design are lack of false positives, scalability, and
simplicity. More details can be found in `Implementation Details`_.
KCSAN uses compile-time instrumentation to instrument memory accesses. KCSAN is
supported in both GCC and Clang. With GCC it requires version 7.3.0 or later.
With Clang it requires version 7.0.0 or later.
Usage
-----
To enable KCSAN configure kernel with::
CONFIG_KCSAN = y
KCSAN provides several other configuration options to customize behaviour (see
their respective help text for more info).
Error reports
~~~~~~~~~~~~~
A typical data race report looks like this::
==================================================================
BUG: KCSAN: data-race in generic_permission / kernfs_refresh_inode
write to 0xffff8fee4c40700c of 4 bytes by task 175 on cpu 4:
kernfs_refresh_inode+0x70/0x170
kernfs_iop_permission+0x4f/0x90
inode_permission+0x190/0x200
link_path_walk.part.0+0x503/0x8e0
path_lookupat.isra.0+0x69/0x4d0
filename_lookup+0x136/0x280
user_path_at_empty+0x47/0x60
vfs_statx+0x9b/0x130
__do_sys_newlstat+0x50/0xb0
__x64_sys_newlstat+0x37/0x50
do_syscall_64+0x85/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
read to 0xffff8fee4c40700c of 4 bytes by task 166 on cpu 6:
generic_permission+0x5b/0x2a0
kernfs_iop_permission+0x66/0x90
inode_permission+0x190/0x200
link_path_walk.part.0+0x503/0x8e0
path_lookupat.isra.0+0x69/0x4d0
filename_lookup+0x136/0x280
user_path_at_empty+0x47/0x60
do_faccessat+0x11a/0x390
__x64_sys_access+0x3c/0x50
do_syscall_64+0x85/0x260
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported by Kernel Concurrency Sanitizer on:
CPU: 6 PID: 166 Comm: systemd-journal Not tainted 5.3.0-rc7+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
==================================================================
The header of the report provides a short summary of the functions involved in
the race. It is followed by the access types and stack traces of the 2 threads
involved in the data race.
The other less common type of data race report looks like this::
==================================================================
BUG: KCSAN: data-race in e1000_clean_rx_irq+0x551/0xb10
race at unknown origin, with read to 0xffff933db8a2ae6c of 1 bytes by interrupt on cpu 0:
e1000_clean_rx_irq+0x551/0xb10
e1000_clean+0x533/0xda0
net_rx_action+0x329/0x900
__do_softirq+0xdb/0x2db
irq_exit+0x9b/0xa0
do_IRQ+0x9c/0xf0
ret_from_intr+0x0/0x18
default_idle+0x3f/0x220
arch_cpu_idle+0x21/0x30
do_idle+0x1df/0x230
cpu_startup_entry+0x14/0x20
rest_init+0xc5/0xcb
arch_call_rest_init+0x13/0x2b
start_kernel+0x6db/0x700
Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.3.0-rc7+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
==================================================================
This report is generated where it was not possible to determine the other
racing thread, but a race was inferred due to the data value of the watched
memory location having changed. These can occur either due to missing
instrumentation or e.g. DMA accesses.
Selective analysis
~~~~~~~~~~~~~~~~~~
To disable KCSAN data race detection for an entire subsystem, add to the
respective ``Makefile``::
KCSAN_SANITIZE := n
To disable KCSAN on a per-file basis, add to the ``Makefile``::
KCSAN_SANITIZE_file.o := n
KCSAN also understands the ``data_race(expr)`` annotation, which tells KCSAN
that any data races due to accesses in ``expr`` should be ignored and resulting
behaviour when encountering a data race is deemed safe.
debugfs
~~~~~~~
* The file ``/sys/kernel/debug/kcsan`` can be read to get stats.
* KCSAN can be turned on or off by writing ``on`` or ``off`` to
``/sys/kernel/debug/kcsan``.
* Writing ``!some_func_name`` to ``/sys/kernel/debug/kcsan`` adds
``some_func_name`` to the report filter list, which (by default) blacklists
reporting data races where either one of the top stackframes are a function
in the list.
* Writing either ``blacklist`` or ``whitelist`` to ``/sys/kernel/debug/kcsan``
changes the report filtering behaviour. For example, the blacklist feature
can be used to silence frequently occurring data races; the whitelist feature
can help with reproduction and testing of fixes.
Data Races
----------
Informally, two operations *conflict* if they access the same memory location,
and at least one of them is a write operation. In an execution, two memory
operations from different threads form a **data race** if they *conflict*, at
least one of them is a *plain access* (non-atomic), and they are *unordered* in
the "happens-before" order according to the `LKMM
<../../tools/memory-model/Documentation/explanation.txt>`_.
Relationship with the Linux Kernel Memory Model (LKMM)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The LKMM defines the propagation and ordering rules of various memory
operations, which gives developers the ability to reason about concurrent code.
Ultimately this allows to determine the possible executions of concurrent code,
and if that code is free from data races.
KCSAN is aware of *atomic* accesses (``READ_ONCE``, ``WRITE_ONCE``,
``atomic_*``, etc.), but is oblivious of any ordering guarantees. In other
words, KCSAN assumes that as long as a plain access is not observed to race
with another conflicting access, memory operations are correctly ordered.
This means that KCSAN will not report *potential* data races due to missing
memory ordering. If, however, missing memory ordering (that is observable with
a particular compiler and architecture) leads to an observable data race (e.g.
entering a critical section erroneously), KCSAN would report the resulting
data race.
Race conditions vs. data races
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Race conditions are logic bugs, where unexpected interleaving of racing
concurrent operations result in an erroneous state.
Data races on the other hand are defined at the *memory model/language level*.
Many data races are also harmful race conditions, which a tool like KCSAN
reports! However, not all data races are race conditions and vice-versa.
KCSAN's intent is to report data races according to the LKMM. A data race
detector can only work at the memory model/language level.
Deeper analysis, to find high-level race conditions only, requires conveying
the intended kernel logic to a tool. This requires (1) the developer writing a
specification or model of their code, and then (2) the tool verifying that the
implementation matches. This has been done for small bits of code using model
checkers and other formal methods, but does not scale to the level of what can
be covered with a dynamic analysis based data race detector such as KCSAN.
For reasons outlined in this `article <https://lwn.net/Articles/793253/>`_,
data races can be much more subtle, but can cause no less harm than high-level
race conditions.
Implementation Details
----------------------
The general approach is inspired by `DataCollider
<http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf>`_.
Unlike DataCollider, KCSAN does not use hardware watchpoints, but instead
relies on compiler instrumentation. Watchpoints are implemented using an
efficient encoding that stores access type, size, and address in a long; the
benefits of using "soft watchpoints" are portability and greater flexibility in
limiting which accesses trigger a watchpoint.
More specifically, KCSAN requires instrumenting plain (unmarked, non-atomic)
memory operations; for each instrumented plain access:
1. Check if a matching watchpoint exists; if yes, and at least one access is a
write, then we encountered a racing access.
2. Periodically, if no matching watchpoint exists, set up a watchpoint and
stall for a small delay.
3. Also check the data value before the delay, and re-check the data value
after delay; if the values mismatch, we infer a race of unknown origin.
To detect data races between plain and atomic memory operations, KCSAN also
annotates atomic accesses, but only to check if a watchpoint exists
(``kcsan_check_atomic_*``); i.e. KCSAN never sets up a watchpoint on atomic
accesses.
Key Properties
~~~~~~~~~~~~~~
1. **Memory Overhead:** The current implementation uses a small array of longs
to encode watchpoint information, which is negligible.
2. **Performance Overhead:** KCSAN's runtime aims to be minimal, using an
efficient watchpoint encoding that does not require acquiring any shared
locks in the fast-path. For kernel boot on a system with 8 CPUs:
- 5.0x slow-down with the default KCSAN config;
- 2.8x slow-down from runtime fast-path overhead only (set very large
``KCSAN_SKIP_WATCH`` and unset ``KCSAN_SKIP_WATCH_RANDOMIZE``).
3. **Annotation Overheads:** Minimal annotations are required outside the KCSAN
runtime. As a result, maintenance overheads are minimal as the kernel
evolves.
4. **Detects Racy Writes from Devices:** Due to checking data values upon
setting up watchpoints, racy writes from devices can also be detected.
5. **Memory Ordering:** KCSAN is *not* explicitly aware of the LKMM's ordering
rules; this may result in missed data races (false negatives).
6. **Analysis Accuracy:** For observed executions, due to using a sampling
strategy, the analysis is *unsound* (false negatives possible), but aims to
be complete (no false positives).
Alternatives Considered
-----------------------
An alternative data race detection approach for the kernel can be found in
`Kernel Thread Sanitizer (KTSAN) <https://github.com/google/ktsan/wiki>`_.
KTSAN is a happens-before data race detector, which explicitly establishes the
happens-before order between memory operations, which can then be used to
determine data races as defined in `Data Races`_. To build a correct
happens-before relation, KTSAN must be aware of all ordering rules of the LKMM
and synchronization primitives. Unfortunately, any omission leads to false
positives, which is especially important in the context of the kernel which
includes numerous custom synchronization mechanisms. Furthermore, KTSAN's
implementation requires metadata for each memory location (shadow memory);
currently, for each page, KTSAN requires 4 pages of shadow memory.

View File

@ -8850,6 +8850,17 @@ F: Documentation/kbuild/kconfig*
F: scripts/kconfig/
F: scripts/Kconfig.include
KCSAN
M: Marco Elver <elver@google.com>
R: Dmitry Vyukov <dvyukov@google.com>
L: kasan-dev@googlegroups.com
S: Maintained
F: Documentation/dev-tools/kcsan.rst
F: include/linux/kcsan*.h
F: kernel/kcsan/
F: lib/Kconfig.kcsan
F: scripts/Makefile.kcsan
KDUMP
M: Dave Young <dyoung@redhat.com>
M: Baoquan He <bhe@redhat.com>

View File

@ -478,7 +478,7 @@ export KBUILD_HOSTCXXFLAGS KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS LDFLAGS_MODULE
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS KBUILD_LDFLAGS
export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN CFLAGS_KCSAN
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
@ -900,6 +900,7 @@ endif
include scripts/Makefile.kasan
include scripts/Makefile.extrawarn
include scripts/Makefile.ubsan
include scripts/Makefile.kcsan
# Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
KBUILD_CPPFLAGS += $(KCPPFLAGS)

View File

@ -226,6 +226,7 @@ config X86
select VIRT_TO_BUS
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_KCSAN if X86_64
config INSTRUCTION_DECODER
def_bool y

View File

@ -9,7 +9,9 @@
# Changed by many, many contributors over the years.
#
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
# Kernel does not boot with kcov instrumentation here.

View File

@ -17,7 +17,9 @@
# (see scripts/Makefile.lib size_append)
# compressed vmlinux.bin.all + u32 size of vmlinux.bin.all
# Sanitizer runtimes are unavailable and cannot be linked for early boot code.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.

View File

@ -10,8 +10,11 @@ ARCH_REL_TYPE_ABS += R_386_GLOB_DAT|R_386_JMP_SLOT|R_386_RELATIVE
include $(srctree)/lib/vdso/Makefile
KBUILD_CFLAGS += $(DISABLE_LTO)
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
UBSAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.

View File

@ -201,8 +201,12 @@ arch_test_and_change_bit(long nr, volatile unsigned long *addr)
return GEN_BINARY_RMWcc(LOCK_PREFIX __ASM_SIZE(btc), *addr, c, "Ir", nr);
}
static __always_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
static __no_kcsan_or_inline bool constant_test_bit(long nr, const volatile unsigned long *addr)
{
/*
* Because this is a plain access, we need to disable KCSAN here to
* avoid double instrumentation via instrumented bitops.
*/
return ((1UL << (nr & (BITS_PER_LONG-1))) &
(addr[nr >> _BITOPS_LONG_SHIFT])) != 0;
}

View File

@ -28,6 +28,10 @@ KASAN_SANITIZE_dumpstack_$(BITS).o := n
KASAN_SANITIZE_stacktrace.o := n
KASAN_SANITIZE_paravirt.o := n
# With some compiler versions the generated code results in boot hangs, caused
# by several compilation units. To be safe, disable all instrumentation.
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y
OBJECT_FILES_NON_STANDARD_test_nx.o := y
OBJECT_FILES_NON_STANDARD_paravirt_patch.o := y

View File

@ -13,6 +13,9 @@ endif
KCOV_INSTRUMENT_common.o := n
KCOV_INSTRUMENT_perf_event.o := n
# As above, instrumenting secondary CPU boot code causes boot hangs.
KCSAN_SANITIZE_common.o := n
# Make sure load_percpu_segment has no stackprotector
nostackp := $(call cc-option, -fno-stack-protector)
CFLAGS_common.o := $(nostackp)

View File

@ -6,10 +6,14 @@
# Produces uninteresting flaky coverage.
KCOV_INSTRUMENT_delay.o := n
# KCSAN uses udelay for introducing watchpoint delay; avoid recursion.
KCSAN_SANITIZE_delay.o := n
# Early boot use of cmdline; don't instrument it
ifdef CONFIG_AMD_MEM_ENCRYPT
KCOV_INSTRUMENT_cmdline.o := n
KASAN_SANITIZE_cmdline.o := n
KCSAN_SANITIZE_cmdline.o := n
ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_cmdline.o = -pg

View File

@ -7,6 +7,10 @@ KCOV_INSTRUMENT_mem_encrypt_identity.o := n
KASAN_SANITIZE_mem_encrypt.o := n
KASAN_SANITIZE_mem_encrypt_identity.o := n
# Disable KCSAN entirely, because otherwise we get warnings that some functions
# reference __initdata sections.
KCSAN_SANITIZE := n
ifdef CONFIG_FUNCTION_TRACER
CFLAGS_REMOVE_mem_encrypt.o = -pg
CFLAGS_REMOVE_mem_encrypt_identity.o = -pg

View File

@ -17,7 +17,9 @@ CFLAGS_sha256.o := -D__DISABLE_EXPORTS
LDFLAGS_purgatory.ro := -e purgatory_start -r --no-undefined -nostdlib -z nodefaultlib
targets += purgatory.ro
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
KCOV_INSTRUMENT := n
# These are adjustments to the compiler flags used for objects that

View File

@ -6,7 +6,10 @@
# for more details.
#
#
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
subdir- := rm

View File

@ -6,7 +6,10 @@
# for more details.
#
#
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y
# Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.

View File

@ -31,7 +31,9 @@ KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
-D__DISABLE_EXPORTS
GCOV_PROFILE := n
# Sanitizer runtimes are unavailable and cannot be linked here.
KASAN_SANITIZE := n
KCSAN_SANITIZE := n
UBSAN_SANITIZE := n
OBJECT_FILES_NON_STANDARD := y

File diff suppressed because it is too large Load Diff

View File

@ -24,6 +24,15 @@
#define __no_sanitize_address
#endif
#if __has_feature(thread_sanitizer)
/* emulate gcc's __SANITIZE_THREAD__ flag */
#define __SANITIZE_THREAD__
#define __no_sanitize_thread \
__attribute__((no_sanitize("thread")))
#else
#define __no_sanitize_thread
#endif
/*
* Not all versions of clang implement the the type-generic versions
* of the builtin overflow checkers. Fortunately, clang implements

View File

@ -145,6 +145,13 @@
#define __no_sanitize_address
#endif
#if defined(__SANITIZE_THREAD__) && __has_attribute(__no_sanitize_thread__)
#define __no_sanitize_thread \
__attribute__((__noinline__)) __attribute__((no_sanitize_thread))
#else
#define __no_sanitize_thread
#endif
#if GCC_VERSION >= 50100
#define COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW 1
#endif

View File

@ -178,6 +178,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
#endif
#include <uapi/linux/types.h>
#include <linux/kcsan-checks.h>
#define __READ_ONCE_SIZE \
({ \
@ -193,12 +194,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
} \
})
static __always_inline
void __read_once_size(const volatile void *p, void *res, int size)
{
__READ_ONCE_SIZE;
}
#ifdef CONFIG_KASAN
/*
* We can't declare function 'inline' because __no_sanitize_address confilcts
@ -207,18 +202,44 @@ void __read_once_size(const volatile void *p, void *res, int size)
* '__maybe_unused' allows us to avoid defined-but-not-used warnings.
*/
# define __no_kasan_or_inline __no_sanitize_address notrace __maybe_unused
# define __no_sanitize_or_inline __no_kasan_or_inline
#else
# define __no_kasan_or_inline __always_inline
#endif
static __no_kasan_or_inline
#ifdef __SANITIZE_THREAD__
/*
* Rely on __SANITIZE_THREAD__ instead of CONFIG_KCSAN, to avoid not inlining in
* compilation units where instrumentation is disabled.
*/
# define __no_kcsan_or_inline __no_sanitize_thread notrace __maybe_unused
# define __no_sanitize_or_inline __no_kcsan_or_inline
#else
# define __no_kcsan_or_inline __always_inline
#endif
#ifndef __no_sanitize_or_inline
#define __no_sanitize_or_inline __always_inline
#endif
static __no_kcsan_or_inline
void __read_once_size(const volatile void *p, void *res, int size)
{
kcsan_check_atomic_read(p, size);
__READ_ONCE_SIZE;
}
static __no_sanitize_or_inline
void __read_once_size_nocheck(const volatile void *p, void *res, int size)
{
__READ_ONCE_SIZE;
}
static __always_inline void __write_once_size(volatile void *p, void *res, int size)
static __no_kcsan_or_inline
void __write_once_size(volatile void *p, void *res, int size)
{
kcsan_check_atomic_write(p, size);
switch (size) {
case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
@ -289,6 +310,26 @@ unsigned long read_word_at_a_time(const void *addr)
__u.__val; \
})
#include <linux/kcsan.h>
/*
* data_race: macro to document that accesses in an expression may conflict with
* other concurrent accesses resulting in data races, but the resulting
* behaviour is deemed safe regardless.
*
* This macro *does not* affect normal code generation, but is a hint to tooling
* that data races here should be ignored.
*/
#define data_race(expr) \
({ \
typeof(({ expr; })) __val; \
kcsan_nestable_atomic_begin(); \
__val = ({ expr; }); \
kcsan_nestable_atomic_end(); \
__val; \
})
#else
#endif /* __KERNEL__ */
/*

View File

@ -0,0 +1,97 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_KCSAN_CHECKS_H
#define _LINUX_KCSAN_CHECKS_H
#include <linux/types.h>
/*
* Access type modifiers.
*/
#define KCSAN_ACCESS_WRITE 0x1
#define KCSAN_ACCESS_ATOMIC 0x2
/*
* __kcsan_*: Always calls into runtime when KCSAN is enabled. This may be used
* even in compilation units that selectively disable KCSAN, but must use KCSAN
* to validate access to an address. Never use these in header files!
*/
#ifdef CONFIG_KCSAN
/**
* __kcsan_check_access - check generic access for data race
*
* @ptr address of access
* @size size of access
* @type access type modifier
*/
void __kcsan_check_access(const volatile void *ptr, size_t size, int type);
#else
static inline void __kcsan_check_access(const volatile void *ptr, size_t size,
int type) { }
#endif
/*
* kcsan_*: Only calls into runtime when the particular compilation unit has
* KCSAN instrumentation enabled. May be used in header files.
*/
#ifdef __SANITIZE_THREAD__
#define kcsan_check_access __kcsan_check_access
#else
static inline void kcsan_check_access(const volatile void *ptr, size_t size,
int type) { }
#endif
/**
* __kcsan_check_read - check regular read access for data races
*
* @ptr address of access
* @size size of access
*/
#define __kcsan_check_read(ptr, size) __kcsan_check_access(ptr, size, 0)
/**
* __kcsan_check_write - check regular write access for data races
*
* @ptr address of access
* @size size of access
*/
#define __kcsan_check_write(ptr, size) \
__kcsan_check_access(ptr, size, KCSAN_ACCESS_WRITE)
/**
* kcsan_check_read - check regular read access for data races
*
* @ptr address of access
* @size size of access
*/
#define kcsan_check_read(ptr, size) kcsan_check_access(ptr, size, 0)
/**
* kcsan_check_write - check regular write access for data races
*
* @ptr address of access
* @size size of access
*/
#define kcsan_check_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_WRITE)
/*
* Check for atomic accesses: if atomic access are not ignored, this simply
* aliases to kcsan_check_access, otherwise becomes a no-op.
*/
#ifdef CONFIG_KCSAN_IGNORE_ATOMICS
#define kcsan_check_atomic_read(...) \
do { \
} while (0)
#define kcsan_check_atomic_write(...) \
do { \
} while (0)
#else
#define kcsan_check_atomic_read(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC)
#define kcsan_check_atomic_write(ptr, size) \
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC | KCSAN_ACCESS_WRITE)
#endif
#endif /* _LINUX_KCSAN_CHECKS_H */

115
include/linux/kcsan.h Normal file
View File

@ -0,0 +1,115 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_KCSAN_H
#define _LINUX_KCSAN_H
#include <linux/kcsan-checks.h>
#include <linux/types.h>
#ifdef CONFIG_KCSAN
/*
* Context for each thread of execution: for tasks, this is stored in
* task_struct, and interrupts access internal per-CPU storage.
*/
struct kcsan_ctx {
int disable_count; /* disable counter */
int atomic_next; /* number of following atomic ops */
/*
* We distinguish between: (a) nestable atomic regions that may contain
* other nestable regions; and (b) flat atomic regions that do not keep
* track of nesting. Both (a) and (b) are entirely independent of each
* other, and a flat region may be started in a nestable region or
* vice-versa.
*
* This is required because, for example, in the annotations for
* seqlocks, we declare seqlock writer critical sections as (a) nestable
* atomic regions, but reader critical sections as (b) flat atomic
* regions, but have encountered cases where seqlock reader critical
* sections are contained within writer critical sections (the opposite
* may be possible, too).
*
* To support these cases, we independently track the depth of nesting
* for (a), and whether the leaf level is flat for (b).
*/
int atomic_nest_count;
bool in_flat_atomic;
};
/**
* kcsan_init - initialize KCSAN runtime
*/
void kcsan_init(void);
/**
* kcsan_disable_current - disable KCSAN for the current context
*
* Supports nesting.
*/
void kcsan_disable_current(void);
/**
* kcsan_enable_current - re-enable KCSAN for the current context
*
* Supports nesting.
*/
void kcsan_enable_current(void);
/**
* kcsan_nestable_atomic_begin - begin nestable atomic region
*
* Accesses within the atomic region may appear to race with other accesses but
* should be considered atomic.
*/
void kcsan_nestable_atomic_begin(void);
/**
* kcsan_nestable_atomic_end - end nestable atomic region
*/
void kcsan_nestable_atomic_end(void);
/**
* kcsan_flat_atomic_begin - begin flat atomic region
*
* Accesses within the atomic region may appear to race with other accesses but
* should be considered atomic.
*/
void kcsan_flat_atomic_begin(void);
/**
* kcsan_flat_atomic_end - end flat atomic region
*/
void kcsan_flat_atomic_end(void);
/**
* kcsan_atomic_next - consider following accesses as atomic
*
* Force treating the next n memory accesses for the current context as atomic
* operations.
*
* @n number of following memory accesses to treat as atomic.
*/
void kcsan_atomic_next(int n);
#else /* CONFIG_KCSAN */
static inline void kcsan_init(void) { }
static inline void kcsan_disable_current(void) { }
static inline void kcsan_enable_current(void) { }
static inline void kcsan_nestable_atomic_begin(void) { }
static inline void kcsan_nestable_atomic_end(void) { }
static inline void kcsan_flat_atomic_begin(void) { }
static inline void kcsan_flat_atomic_end(void) { }
static inline void kcsan_atomic_next(int n) { }
#endif /* CONFIG_KCSAN */
#endif /* _LINUX_KCSAN_H */

View File

@ -31,6 +31,7 @@
#include <linux/task_io_accounting.h>
#include <linux/posix-timers.h>
#include <linux/rseq.h>
#include <linux/kcsan.h>
/* task_struct member predeclarations (sorted alphabetically): */
struct audit_context;
@ -1172,6 +1173,9 @@ struct task_struct {
#ifdef CONFIG_KASAN
unsigned int kasan_depth;
#endif
#ifdef CONFIG_KCSAN
struct kcsan_ctx kcsan_ctx;
#endif
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
/* Index of current stored address in ret_stack: */

View File

@ -37,8 +37,24 @@
#include <linux/preempt.h>
#include <linux/lockdep.h>
#include <linux/compiler.h>
#include <linux/kcsan.h>
#include <asm/processor.h>
/*
* The seqlock interface does not prescribe a precise sequence of read
* begin/retry/end. For readers, typically there is a call to
* read_seqcount_begin() and read_seqcount_retry(), however, there are more
* esoteric cases which do not follow this pattern.
*
* As a consequence, we take the following best-effort approach for raw usage
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
* pessimistically mark then next KCSAN_SEQLOCK_REGION_MAX memory accesses as
* atomics; if there is a matching read_seqcount_retry() call, no following
* memory operations are considered atomic. Usage of seqlocks via seqlock_t
* interface is not affected.
*/
#define KCSAN_SEQLOCK_REGION_MAX 1000
/*
* Version using sequence counter only.
* This can be used when code has its own mutex protecting the
@ -115,6 +131,7 @@ static inline unsigned __read_seqcount_begin(const seqcount_t *s)
cpu_relax();
goto repeat;
}
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret;
}
@ -131,6 +148,7 @@ static inline unsigned raw_read_seqcount(const seqcount_t *s)
{
unsigned ret = READ_ONCE(s->sequence);
smp_rmb();
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret;
}
@ -183,6 +201,7 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
{
unsigned ret = READ_ONCE(s->sequence);
smp_rmb();
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
return ret & ~1;
}
@ -202,7 +221,8 @@ static inline unsigned raw_seqcount_begin(const seqcount_t *s)
*/
static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
{
return unlikely(s->sequence != start);
kcsan_atomic_next(0);
return unlikely(READ_ONCE(s->sequence) != start);
}
/**
@ -225,6 +245,7 @@ static inline int read_seqcount_retry(const seqcount_t *s, unsigned start)
static inline void raw_write_seqcount_begin(seqcount_t *s)
{
kcsan_nestable_atomic_begin();
s->sequence++;
smp_wmb();
}
@ -233,6 +254,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
{
smp_wmb();
s->sequence++;
kcsan_nestable_atomic_end();
}
/**
@ -243,6 +265,13 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
* usual consistency guarantee. It is one wmb cheaper, because we can
* collapse the two back-to-back wmb()s.
*
* Note that, writes surrounding the barrier should be declared atomic (e.g.
* via WRITE_ONCE): a) to ensure the writes become visible to other threads
* atomically, avoiding compiler optimizations; b) to document which writes are
* meant to propagate to the reader critical section. This is necessary because
* neither writes before and after the barrier are enclosed in a seq-writer
* critical section that would ensure readers are aware of ongoing writes.
*
* seqcount_t seq;
* bool X = true, Y = false;
*
@ -262,18 +291,20 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
*
* void write(void)
* {
* Y = true;
* WRITE_ONCE(Y, true);
*
* raw_write_seqcount_barrier(seq);
*
* X = false;
* WRITE_ONCE(X, false);
* }
*/
static inline void raw_write_seqcount_barrier(seqcount_t *s)
{
kcsan_nestable_atomic_begin();
s->sequence++;
smp_wmb();
s->sequence++;
kcsan_nestable_atomic_end();
}
static inline int raw_read_seqcount_latch(seqcount_t *s)
@ -398,7 +429,9 @@ static inline void write_seqcount_end(seqcount_t *s)
static inline void write_seqcount_invalidate(seqcount_t *s)
{
smp_wmb();
kcsan_nestable_atomic_begin();
s->sequence+=2;
kcsan_nestable_atomic_end();
}
typedef struct {
@ -430,11 +463,21 @@ typedef struct {
*/
static inline unsigned read_seqbegin(const seqlock_t *sl)
{
return read_seqcount_begin(&sl->seqcount);
unsigned ret = read_seqcount_begin(&sl->seqcount);
kcsan_atomic_next(0); /* non-raw usage, assume closing read_seqretry */
kcsan_flat_atomic_begin();
return ret;
}
static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
{
/*
* Assume not nested: read_seqretry may be called multiple times when
* completing read critical section.
*/
kcsan_flat_atomic_end();
return read_seqcount_retry(&sl->seqcount, start);
}

View File

@ -161,6 +161,14 @@ struct task_struct init_task
#ifdef CONFIG_KASAN
.kasan_depth = 1,
#endif
#ifdef CONFIG_KCSAN
.kcsan_ctx = {
.disable_count = 0,
.atomic_next = 0,
.atomic_nest_count = 0,
.in_flat_atomic = false,
},
#endif
#ifdef CONFIG_TRACE_IRQFLAGS
.softirqs_enabled = 1,
#endif

View File

@ -93,6 +93,7 @@
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
#include <linux/mem_encrypt.h>
#include <linux/kcsan.h>
#include <asm/io.h>
#include <asm/bugs.h>
@ -779,6 +780,7 @@ asmlinkage __visible void __init start_kernel(void)
acpi_subsystem_init();
arch_post_acpi_subsys_init();
sfi_init_late();
kcsan_init();
/* Do the rest non-__init'ed, we're now alive */
arch_call_rest_init();

View File

@ -23,6 +23,9 @@ endif
# Prevents flicker of uninteresting __do_softirq()/__local_bh_disable_ip()
# in coverage traces.
KCOV_INSTRUMENT_softirq.o := n
# Avoid KCSAN instrumentation in softirq ("No shared variables, all the data
# are CPU local" => assume no data races), to reduce overhead in interrupts.
KCSAN_SANITIZE_softirq.o = n
# These are called from save_stack_trace() on slub debug path,
# and produce insane amounts of uninteresting coverage.
KCOV_INSTRUMENT_module.o := n
@ -30,6 +33,7 @@ KCOV_INSTRUMENT_extable.o := n
# Don't self-instrument.
KCOV_INSTRUMENT_kcov.o := n
KASAN_SANITIZE_kcov.o := n
KCSAN_SANITIZE_kcov.o := n
CFLAGS_kcov.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
# cond_syscall is currently not LTO compatible
@ -102,6 +106,7 @@ obj-$(CONFIG_TRACEPOINTS) += trace/
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
obj-$(CONFIG_KCSAN) += kcsan/
obj-$(CONFIG_PERF_EVENTS) += events/
@ -117,6 +122,7 @@ obj-$(CONFIG_RSEQ) += rseq.o
obj-$(CONFIG_GCC_PLUGIN_STACKLEAK) += stackleak.o
KASAN_SANITIZE_stackleak.o := n
KCSAN_SANITIZE_stackleak.o := n
KCOV_INSTRUMENT_stackleak.o := n
$(obj)/configs.o: $(obj)/config_data.gz

11
kernel/kcsan/Makefile Normal file
View File

@ -0,0 +1,11 @@
# SPDX-License-Identifier: GPL-2.0
KCSAN_SANITIZE := n
KCOV_INSTRUMENT := n
CFLAGS_REMOVE_core.o = $(CC_FLAGS_FTRACE)
CFLAGS_core.o := $(call cc-option,-fno-conserve-stack,) \
$(call cc-option,-fno-stack-protector,)
obj-y := core.o debugfs.o report.o
obj-$(CONFIG_KCSAN_SELFTEST) += test.o

27
kernel/kcsan/atomic.h Normal file
View File

@ -0,0 +1,27 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _KERNEL_KCSAN_ATOMIC_H
#define _KERNEL_KCSAN_ATOMIC_H
#include <linux/jiffies.h>
/*
* Helper that returns true if access to ptr should be considered as an atomic
* access, even though it is not explicitly atomic.
*
* List all volatile globals that have been observed in races, to suppress
* data race reports between accesses to these variables.
*
* For now, we assume that volatile accesses of globals are as strong as atomic
* accesses (READ_ONCE, WRITE_ONCE cast to volatile). The situation is still not
* entirely clear, as on some architectures (Alpha) READ_ONCE/WRITE_ONCE do more
* than cast to volatile. Eventually, we hope to be able to remove this
* function.
*/
static inline bool kcsan_is_atomic(const volatile void *ptr)
{
/* only jiffies for now */
return ptr == &jiffies;
}
#endif /* _KERNEL_KCSAN_ATOMIC_H */

626
kernel/kcsan/core.c Normal file
View File

@ -0,0 +1,626 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/atomic.h>
#include <linux/bug.h>
#include <linux/delay.h>
#include <linux/export.h>
#include <linux/init.h>
#include <linux/percpu.h>
#include <linux/preempt.h>
#include <linux/random.h>
#include <linux/sched.h>
#include <linux/uaccess.h>
#include "atomic.h"
#include "encoding.h"
#include "kcsan.h"
bool kcsan_enabled;
/* Per-CPU kcsan_ctx for interrupts */
static DEFINE_PER_CPU(struct kcsan_ctx, kcsan_cpu_ctx) = {
.disable_count = 0,
.atomic_next = 0,
.atomic_nest_count = 0,
.in_flat_atomic = false,
};
/*
* Helper macros to index into adjacent slots slots, starting from address slot
* itself, followed by the right and left slots.
*
* The purpose is 2-fold:
*
* 1. if during insertion the address slot is already occupied, check if
* any adjacent slots are free;
* 2. accesses that straddle a slot boundary due to size that exceeds a
* slot's range may check adjacent slots if any watchpoint matches.
*
* Note that accesses with very large size may still miss a watchpoint; however,
* given this should be rare, this is a reasonable trade-off to make, since this
* will avoid:
*
* 1. excessive contention between watchpoint checks and setup;
* 2. larger number of simultaneous watchpoints without sacrificing
* performance.
*
* Example: SLOT_IDX values for KCSAN_CHECK_ADJACENT=1, where i is [0, 1, 2]:
*
* slot=0: [ 1, 2, 0]
* slot=9: [10, 11, 9]
* slot=63: [64, 65, 63]
*/
#define NUM_SLOTS (1 + 2 * KCSAN_CHECK_ADJACENT)
#define SLOT_IDX(slot, i) (slot + ((i + KCSAN_CHECK_ADJACENT) % NUM_SLOTS))
/*
* SLOT_IDX_FAST is used in fast-path. Not first checking the address's primary
* slot (middle) is fine if we assume that data races occur rarely. The set of
* indices {SLOT_IDX(slot, i) | i in [0, NUM_SLOTS)} is equivalent to
* {SLOT_IDX_FAST(slot, i) | i in [0, NUM_SLOTS)}.
*/
#define SLOT_IDX_FAST(slot, i) (slot + i)
/*
* Watchpoints, with each entry encoded as defined in encoding.h: in order to be
* able to safely update and access a watchpoint without introducing locking
* overhead, we encode each watchpoint as a single atomic long. The initial
* zero-initialized state matches INVALID_WATCHPOINT.
*
* Add NUM_SLOTS-1 entries to account for overflow; this helps avoid having to
* use more complicated SLOT_IDX_FAST calculation with modulo in fast-path.
*/
static atomic_long_t watchpoints[CONFIG_KCSAN_NUM_WATCHPOINTS + NUM_SLOTS - 1];
/*
* Instructions to skip watching counter, used in should_watch(). We use a
* per-CPU counter to avoid excessive contention.
*/
static DEFINE_PER_CPU(long, kcsan_skip);
static inline atomic_long_t *find_watchpoint(unsigned long addr, size_t size,
bool expect_write,
long *encoded_watchpoint)
{
const int slot = watchpoint_slot(addr);
const unsigned long addr_masked = addr & WATCHPOINT_ADDR_MASK;
atomic_long_t *watchpoint;
unsigned long wp_addr_masked;
size_t wp_size;
bool is_write;
int i;
BUILD_BUG_ON(CONFIG_KCSAN_NUM_WATCHPOINTS < NUM_SLOTS);
for (i = 0; i < NUM_SLOTS; ++i) {
watchpoint = &watchpoints[SLOT_IDX_FAST(slot, i)];
*encoded_watchpoint = atomic_long_read(watchpoint);
if (!decode_watchpoint(*encoded_watchpoint, &wp_addr_masked,
&wp_size, &is_write))
continue;
if (expect_write && !is_write)
continue;
/* Check if the watchpoint matches the access. */
if (matching_access(wp_addr_masked, wp_size, addr_masked, size))
return watchpoint;
}
return NULL;
}
static inline atomic_long_t *insert_watchpoint(unsigned long addr, size_t size,
bool is_write)
{
const int slot = watchpoint_slot(addr);
const long encoded_watchpoint = encode_watchpoint(addr, size, is_write);
atomic_long_t *watchpoint;
int i;
/* Check slot index logic, ensuring we stay within array bounds. */
BUILD_BUG_ON(SLOT_IDX(0, 0) != KCSAN_CHECK_ADJACENT);
BUILD_BUG_ON(SLOT_IDX(0, KCSAN_CHECK_ADJACENT + 1) != 0);
BUILD_BUG_ON(SLOT_IDX(CONFIG_KCSAN_NUM_WATCHPOINTS - 1,
KCSAN_CHECK_ADJACENT) !=
ARRAY_SIZE(watchpoints) - 1);
BUILD_BUG_ON(SLOT_IDX(CONFIG_KCSAN_NUM_WATCHPOINTS - 1,
KCSAN_CHECK_ADJACENT + 1) !=
ARRAY_SIZE(watchpoints) - NUM_SLOTS);
for (i = 0; i < NUM_SLOTS; ++i) {
long expect_val = INVALID_WATCHPOINT;
/* Try to acquire this slot. */
watchpoint = &watchpoints[SLOT_IDX(slot, i)];
if (atomic_long_try_cmpxchg_relaxed(watchpoint, &expect_val,
encoded_watchpoint))
return watchpoint;
}
return NULL;
}
/*
* Return true if watchpoint was successfully consumed, false otherwise.
*
* This may return false if:
*
* 1. another thread already consumed the watchpoint;
* 2. the thread that set up the watchpoint already removed it;
* 3. the watchpoint was removed and then re-used.
*/
static inline bool try_consume_watchpoint(atomic_long_t *watchpoint,
long encoded_watchpoint)
{
return atomic_long_try_cmpxchg_relaxed(watchpoint, &encoded_watchpoint,
CONSUMED_WATCHPOINT);
}
/*
* Return true if watchpoint was not touched, false if consumed.
*/
static inline bool remove_watchpoint(atomic_long_t *watchpoint)
{
return atomic_long_xchg_relaxed(watchpoint, INVALID_WATCHPOINT) !=
CONSUMED_WATCHPOINT;
}
static inline struct kcsan_ctx *get_ctx(void)
{
/*
* In interrupt, use raw_cpu_ptr to avoid unnecessary checks, that would
* also result in calls that generate warnings in uaccess regions.
*/
return in_task() ? &current->kcsan_ctx : raw_cpu_ptr(&kcsan_cpu_ctx);
}
static inline bool is_atomic(const volatile void *ptr)
{
struct kcsan_ctx *ctx = get_ctx();
if (unlikely(ctx->atomic_next > 0)) {
/*
* Because we do not have separate contexts for nested
* interrupts, in case atomic_next is set, we simply assume that
* the outer interrupt set atomic_next. In the worst case, we
* will conservatively consider operations as atomic. This is a
* reasonable trade-off to make, since this case should be
* extremely rare; however, even if extremely rare, it could
* lead to false positives otherwise.
*/
if ((hardirq_count() >> HARDIRQ_SHIFT) < 2)
--ctx->atomic_next; /* in task, or outer interrupt */
return true;
}
if (unlikely(ctx->atomic_nest_count > 0 || ctx->in_flat_atomic))
return true;
return kcsan_is_atomic(ptr);
}
static inline bool should_watch(const volatile void *ptr, int type)
{
/*
* Never set up watchpoints when memory operations are atomic.
*
* Need to check this first, before kcsan_skip check below: (1) atomics
* should not count towards skipped instructions, and (2) to actually
* decrement kcsan_atomic_next for consecutive instruction stream.
*/
if ((type & KCSAN_ACCESS_ATOMIC) != 0 || is_atomic(ptr))
return false;
if (this_cpu_dec_return(kcsan_skip) >= 0)
return false;
/*
* NOTE: If we get here, kcsan_skip must always be reset in slow path
* via reset_kcsan_skip() to avoid underflow.
*/
/* this operation should be watched */
return true;
}
static inline void reset_kcsan_skip(void)
{
long skip_count = CONFIG_KCSAN_SKIP_WATCH -
(IS_ENABLED(CONFIG_KCSAN_SKIP_WATCH_RANDOMIZE) ?
prandom_u32_max(CONFIG_KCSAN_SKIP_WATCH) :
0);
this_cpu_write(kcsan_skip, skip_count);
}
static inline bool kcsan_is_enabled(void)
{
return READ_ONCE(kcsan_enabled) && get_ctx()->disable_count == 0;
}
static inline unsigned int get_delay(void)
{
unsigned int delay = in_task() ? CONFIG_KCSAN_UDELAY_TASK :
CONFIG_KCSAN_UDELAY_INTERRUPT;
return delay - (IS_ENABLED(CONFIG_KCSAN_DELAY_RANDOMIZE) ?
prandom_u32_max(delay) :
0);
}
/*
* Pull everything together: check_access() below contains the performance
* critical operations; the fast-path (including check_access) functions should
* all be inlinable by the instrumentation functions.
*
* The slow-path (kcsan_found_watchpoint, kcsan_setup_watchpoint) are
* non-inlinable -- note that, we prefix these with "kcsan_" to ensure they can
* be filtered from the stacktrace, as well as give them unique names for the
* UACCESS whitelist of objtool. Each function uses user_access_save/restore(),
* since they do not access any user memory, but instrumentation is still
* emitted in UACCESS regions.
*/
static noinline void kcsan_found_watchpoint(const volatile void *ptr,
size_t size, bool is_write,
atomic_long_t *watchpoint,
long encoded_watchpoint)
{
unsigned long flags;
bool consumed;
if (!kcsan_is_enabled())
return;
/*
* Consume the watchpoint as soon as possible, to minimize the chances
* of !consumed. Consuming the watchpoint must always be guarded by
* kcsan_is_enabled() check, as otherwise we might erroneously
* triggering reports when disabled.
*/
consumed = try_consume_watchpoint(watchpoint, encoded_watchpoint);
/* keep this after try_consume_watchpoint */
flags = user_access_save();
if (consumed) {
kcsan_report(ptr, size, is_write, true, raw_smp_processor_id(),
KCSAN_REPORT_CONSUMED_WATCHPOINT);
} else {
/*
* The other thread may not print any diagnostics, as it has
* already removed the watchpoint, or another thread consumed
* the watchpoint before this thread.
*/
kcsan_counter_inc(KCSAN_COUNTER_REPORT_RACES);
}
kcsan_counter_inc(KCSAN_COUNTER_DATA_RACES);
user_access_restore(flags);
}
static noinline void kcsan_setup_watchpoint(const volatile void *ptr,
size_t size, bool is_write)
{
atomic_long_t *watchpoint;
union {
u8 _1;
u16 _2;
u32 _4;
u64 _8;
} expect_value;
bool value_change = false;
unsigned long ua_flags = user_access_save();
unsigned long irq_flags;
/*
* Always reset kcsan_skip counter in slow-path to avoid underflow; see
* should_watch().
*/
reset_kcsan_skip();
if (!kcsan_is_enabled())
goto out;
if (!check_encodable((unsigned long)ptr, size)) {
kcsan_counter_inc(KCSAN_COUNTER_UNENCODABLE_ACCESSES);
goto out;
}
/*
* Disable interrupts & preemptions to avoid another thread on the same
* CPU accessing memory locations for the set up watchpoint; this is to
* avoid reporting races to e.g. CPU-local data.
*
* An alternative would be adding the source CPU to the watchpoint
* encoding, and checking that watchpoint-CPU != this-CPU. There are
* several problems with this:
* 1. we should avoid stealing more bits from the watchpoint encoding
* as it would affect accuracy, as well as increase performance
* overhead in the fast-path;
* 2. if we are preempted, but there *is* a genuine data race, we
* would *not* report it -- since this is the common case (vs.
* CPU-local data accesses), it makes more sense (from a data race
* detection point of view) to simply disable preemptions to ensure
* as many tasks as possible run on other CPUs.
*/
local_irq_save(irq_flags);
watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write);
if (watchpoint == NULL) {
/*
* Out of capacity: the size of `watchpoints`, and the frequency
* with which `should_watch()` returns true should be tweaked so
* that this case happens very rarely.
*/
kcsan_counter_inc(KCSAN_COUNTER_NO_CAPACITY);
goto out_unlock;
}
kcsan_counter_inc(KCSAN_COUNTER_SETUP_WATCHPOINTS);
kcsan_counter_inc(KCSAN_COUNTER_USED_WATCHPOINTS);
/*
* Read the current value, to later check and infer a race if the data
* was modified via a non-instrumented access, e.g. from a device.
*/
switch (size) {
case 1:
expect_value._1 = READ_ONCE(*(const u8 *)ptr);
break;
case 2:
expect_value._2 = READ_ONCE(*(const u16 *)ptr);
break;
case 4:
expect_value._4 = READ_ONCE(*(const u32 *)ptr);
break;
case 8:
expect_value._8 = READ_ONCE(*(const u64 *)ptr);
break;
default:
break; /* ignore; we do not diff the values */
}
if (IS_ENABLED(CONFIG_KCSAN_DEBUG)) {
kcsan_disable_current();
pr_err("KCSAN: watching %s, size: %zu, addr: %px [slot: %d, encoded: %lx]\n",
is_write ? "write" : "read", size, ptr,
watchpoint_slot((unsigned long)ptr),
encode_watchpoint((unsigned long)ptr, size, is_write));
kcsan_enable_current();
}
/*
* Delay this thread, to increase probability of observing a racy
* conflicting access.
*/
udelay(get_delay());
/*
* Re-read value, and check if it is as expected; if not, we infer a
* racy access.
*/
switch (size) {
case 1:
value_change = expect_value._1 != READ_ONCE(*(const u8 *)ptr);
break;
case 2:
value_change = expect_value._2 != READ_ONCE(*(const u16 *)ptr);
break;
case 4:
value_change = expect_value._4 != READ_ONCE(*(const u32 *)ptr);
break;
case 8:
value_change = expect_value._8 != READ_ONCE(*(const u64 *)ptr);
break;
default:
break; /* ignore; we do not diff the values */
}
/* Check if this access raced with another. */
if (!remove_watchpoint(watchpoint)) {
/*
* No need to increment 'data_races' counter, as the racing
* thread already did.
*/
kcsan_report(ptr, size, is_write, size > 8 || value_change,
smp_processor_id(), KCSAN_REPORT_RACE_SIGNAL);
} else if (value_change) {
/* Inferring a race, since the value should not have changed. */
kcsan_counter_inc(KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN);
if (IS_ENABLED(CONFIG_KCSAN_REPORT_RACE_UNKNOWN_ORIGIN))
kcsan_report(ptr, size, is_write, true,
smp_processor_id(),
KCSAN_REPORT_RACE_UNKNOWN_ORIGIN);
}
kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS);
out_unlock:
local_irq_restore(irq_flags);
out:
user_access_restore(ua_flags);
}
static __always_inline void check_access(const volatile void *ptr, size_t size,
int type)
{
const bool is_write = (type & KCSAN_ACCESS_WRITE) != 0;
atomic_long_t *watchpoint;
long encoded_watchpoint;
/*
* Avoid user_access_save in fast-path: find_watchpoint is safe without
* user_access_save, as the address that ptr points to is only used to
* check if a watchpoint exists; ptr is never dereferenced.
*/
watchpoint = find_watchpoint((unsigned long)ptr, size, !is_write,
&encoded_watchpoint);
/*
* It is safe to check kcsan_is_enabled() after find_watchpoint in the
* slow-path, as long as no state changes that cause a data race to be
* detected and reported have occurred until kcsan_is_enabled() is
* checked.
*/
if (unlikely(watchpoint != NULL))
kcsan_found_watchpoint(ptr, size, is_write, watchpoint,
encoded_watchpoint);
else if (unlikely(should_watch(ptr, type)))
kcsan_setup_watchpoint(ptr, size, is_write);
}
/* === Public interface ===================================================== */
void __init kcsan_init(void)
{
BUG_ON(!in_task());
kcsan_debugfs_init();
/*
* We are in the init task, and no other tasks should be running;
* WRITE_ONCE without memory barrier is sufficient.
*/
if (IS_ENABLED(CONFIG_KCSAN_EARLY_ENABLE))
WRITE_ONCE(kcsan_enabled, true);
}
/* === Exported interface =================================================== */
void kcsan_disable_current(void)
{
++get_ctx()->disable_count;
}
EXPORT_SYMBOL(kcsan_disable_current);
void kcsan_enable_current(void)
{
if (get_ctx()->disable_count-- == 0) {
/*
* Warn if kcsan_enable_current() calls are unbalanced with
* kcsan_disable_current() calls, which causes disable_count to
* become negative and should not happen.
*/
kcsan_disable_current(); /* restore to 0, KCSAN still enabled */
kcsan_disable_current(); /* disable to generate warning */
WARN(1, "Unbalanced %s()", __func__);
kcsan_enable_current();
}
}
EXPORT_SYMBOL(kcsan_enable_current);
void kcsan_nestable_atomic_begin(void)
{
/*
* Do *not* check and warn if we are in a flat atomic region: nestable
* and flat atomic regions are independent from each other.
* See include/linux/kcsan.h: struct kcsan_ctx comments for more
* comments.
*/
++get_ctx()->atomic_nest_count;
}
EXPORT_SYMBOL(kcsan_nestable_atomic_begin);
void kcsan_nestable_atomic_end(void)
{
if (get_ctx()->atomic_nest_count-- == 0) {
/*
* Warn if kcsan_nestable_atomic_end() calls are unbalanced with
* kcsan_nestable_atomic_begin() calls, which causes
* atomic_nest_count to become negative and should not happen.
*/
kcsan_nestable_atomic_begin(); /* restore to 0 */
kcsan_disable_current(); /* disable to generate warning */
WARN(1, "Unbalanced %s()", __func__);
kcsan_enable_current();
}
}
EXPORT_SYMBOL(kcsan_nestable_atomic_end);
void kcsan_flat_atomic_begin(void)
{
get_ctx()->in_flat_atomic = true;
}
EXPORT_SYMBOL(kcsan_flat_atomic_begin);
void kcsan_flat_atomic_end(void)
{
get_ctx()->in_flat_atomic = false;
}
EXPORT_SYMBOL(kcsan_flat_atomic_end);
void kcsan_atomic_next(int n)
{
get_ctx()->atomic_next = n;
}
EXPORT_SYMBOL(kcsan_atomic_next);
void __kcsan_check_access(const volatile void *ptr, size_t size, int type)
{
check_access(ptr, size, type);
}
EXPORT_SYMBOL(__kcsan_check_access);
/*
* KCSAN uses the same instrumentation that is emitted by supported compilers
* for ThreadSanitizer (TSAN).
*
* When enabled, the compiler emits instrumentation calls (the functions
* prefixed with "__tsan" below) for all loads and stores that it generated;
* inline asm is not instrumented.
*
* Note that, not all supported compiler versions distinguish aligned/unaligned
* accesses, but e.g. recent versions of Clang do. We simply alias the unaligned
* version to the generic version, which can handle both.
*/
#define DEFINE_TSAN_READ_WRITE(size) \
void __tsan_read##size(void *ptr) \
{ \
check_access(ptr, size, 0); \
} \
EXPORT_SYMBOL(__tsan_read##size); \
void __tsan_unaligned_read##size(void *ptr) \
__alias(__tsan_read##size); \
EXPORT_SYMBOL(__tsan_unaligned_read##size); \
void __tsan_write##size(void *ptr) \
{ \
check_access(ptr, size, KCSAN_ACCESS_WRITE); \
} \
EXPORT_SYMBOL(__tsan_write##size); \
void __tsan_unaligned_write##size(void *ptr) \
__alias(__tsan_write##size); \
EXPORT_SYMBOL(__tsan_unaligned_write##size)
DEFINE_TSAN_READ_WRITE(1);
DEFINE_TSAN_READ_WRITE(2);
DEFINE_TSAN_READ_WRITE(4);
DEFINE_TSAN_READ_WRITE(8);
DEFINE_TSAN_READ_WRITE(16);
void __tsan_read_range(void *ptr, size_t size)
{
check_access(ptr, size, 0);
}
EXPORT_SYMBOL(__tsan_read_range);
void __tsan_write_range(void *ptr, size_t size)
{
check_access(ptr, size, KCSAN_ACCESS_WRITE);
}
EXPORT_SYMBOL(__tsan_write_range);
/*
* The below are not required by KCSAN, but can still be emitted by the
* compiler.
*/
void __tsan_func_entry(void *call_pc)
{
}
EXPORT_SYMBOL(__tsan_func_entry);
void __tsan_func_exit(void)
{
}
EXPORT_SYMBOL(__tsan_func_exit);
void __tsan_init(void)
{
}
EXPORT_SYMBOL(__tsan_init);

275
kernel/kcsan/debugfs.c Normal file
View File

@ -0,0 +1,275 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/atomic.h>
#include <linux/bsearch.h>
#include <linux/bug.h>
#include <linux/debugfs.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#include <linux/seq_file.h>
#include <linux/slab.h>
#include <linux/sort.h>
#include <linux/string.h>
#include <linux/uaccess.h>
#include "kcsan.h"
/*
* Statistics counters.
*/
static atomic_long_t counters[KCSAN_COUNTER_COUNT];
/*
* Addresses for filtering functions from reporting. This list can be used as a
* whitelist or blacklist.
*/
static struct {
unsigned long *addrs; /* array of addresses */
size_t size; /* current size */
int used; /* number of elements used */
bool sorted; /* if elements are sorted */
bool whitelist; /* if list is a blacklist or whitelist */
} report_filterlist = {
.addrs = NULL,
.size = 8, /* small initial size */
.used = 0,
.sorted = false,
.whitelist = false, /* default is blacklist */
};
static DEFINE_SPINLOCK(report_filterlist_lock);
static const char *counter_to_name(enum kcsan_counter_id id)
{
switch (id) {
case KCSAN_COUNTER_USED_WATCHPOINTS:
return "used_watchpoints";
case KCSAN_COUNTER_SETUP_WATCHPOINTS:
return "setup_watchpoints";
case KCSAN_COUNTER_DATA_RACES:
return "data_races";
case KCSAN_COUNTER_NO_CAPACITY:
return "no_capacity";
case KCSAN_COUNTER_REPORT_RACES:
return "report_races";
case KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN:
return "races_unknown_origin";
case KCSAN_COUNTER_UNENCODABLE_ACCESSES:
return "unencodable_accesses";
case KCSAN_COUNTER_ENCODING_FALSE_POSITIVES:
return "encoding_false_positives";
case KCSAN_COUNTER_COUNT:
BUG();
}
return NULL;
}
void kcsan_counter_inc(enum kcsan_counter_id id)
{
atomic_long_inc(&counters[id]);
}
void kcsan_counter_dec(enum kcsan_counter_id id)
{
atomic_long_dec(&counters[id]);
}
/*
* The microbenchmark allows benchmarking KCSAN core runtime only. To run
* multiple threads, pipe 'microbench=<iters>' from multiple tasks into the
* debugfs file.
*/
static void microbenchmark(unsigned long iters)
{
cycles_t cycles;
pr_info("KCSAN: %s begin | iters: %lu\n", __func__, iters);
cycles = get_cycles();
while (iters--) {
/*
* We can run this benchmark from multiple tasks; this address
* calculation increases likelyhood of some accesses overlapping
* (they still won't conflict because all are reads).
*/
unsigned long addr =
iters % (CONFIG_KCSAN_NUM_WATCHPOINTS * PAGE_SIZE);
__kcsan_check_read((void *)addr, sizeof(long));
}
cycles = get_cycles() - cycles;
pr_info("KCSAN: %s end | cycles: %llu\n", __func__, cycles);
}
static int cmp_filterlist_addrs(const void *rhs, const void *lhs)
{
const unsigned long a = *(const unsigned long *)rhs;
const unsigned long b = *(const unsigned long *)lhs;
return a < b ? -1 : a == b ? 0 : 1;
}
bool kcsan_skip_report_debugfs(unsigned long func_addr)
{
unsigned long symbolsize, offset;
unsigned long flags;
bool ret = false;
if (!kallsyms_lookup_size_offset(func_addr, &symbolsize, &offset))
return false;
func_addr -= offset; /* get function start */
spin_lock_irqsave(&report_filterlist_lock, flags);
if (report_filterlist.used == 0)
goto out;
/* Sort array if it is unsorted, and then do a binary search. */
if (!report_filterlist.sorted) {
sort(report_filterlist.addrs, report_filterlist.used,
sizeof(unsigned long), cmp_filterlist_addrs, NULL);
report_filterlist.sorted = true;
}
ret = !!bsearch(&func_addr, report_filterlist.addrs,
report_filterlist.used, sizeof(unsigned long),
cmp_filterlist_addrs);
if (report_filterlist.whitelist)
ret = !ret;
out:
spin_unlock_irqrestore(&report_filterlist_lock, flags);
return ret;
}
static void set_report_filterlist_whitelist(bool whitelist)
{
unsigned long flags;
spin_lock_irqsave(&report_filterlist_lock, flags);
report_filterlist.whitelist = whitelist;
spin_unlock_irqrestore(&report_filterlist_lock, flags);
}
/* Returns 0 on success, error-code otherwise. */
static ssize_t insert_report_filterlist(const char *func)
{
unsigned long flags;
unsigned long addr = kallsyms_lookup_name(func);
ssize_t ret = 0;
if (!addr) {
pr_err("KCSAN: could not find function: '%s'\n", func);
return -ENOENT;
}
spin_lock_irqsave(&report_filterlist_lock, flags);
if (report_filterlist.addrs == NULL) {
/* initial allocation */
report_filterlist.addrs =
kmalloc_array(report_filterlist.size,
sizeof(unsigned long), GFP_KERNEL);
if (report_filterlist.addrs == NULL) {
ret = -ENOMEM;
goto out;
}
} else if (report_filterlist.used == report_filterlist.size) {
/* resize filterlist */
size_t new_size = report_filterlist.size * 2;
unsigned long *new_addrs =
krealloc(report_filterlist.addrs,
new_size * sizeof(unsigned long), GFP_KERNEL);
if (new_addrs == NULL) {
/* leave filterlist itself untouched */
ret = -ENOMEM;
goto out;
}
report_filterlist.size = new_size;
report_filterlist.addrs = new_addrs;
}
/* Note: deduplicating should be done in userspace. */
report_filterlist.addrs[report_filterlist.used++] =
kallsyms_lookup_name(func);
report_filterlist.sorted = false;
out:
spin_unlock_irqrestore(&report_filterlist_lock, flags);
return ret;
}
static int show_info(struct seq_file *file, void *v)
{
int i;
unsigned long flags;
/* show stats */
seq_printf(file, "enabled: %i\n", READ_ONCE(kcsan_enabled));
for (i = 0; i < KCSAN_COUNTER_COUNT; ++i)
seq_printf(file, "%s: %ld\n", counter_to_name(i),
atomic_long_read(&counters[i]));
/* show filter functions, and filter type */
spin_lock_irqsave(&report_filterlist_lock, flags);
seq_printf(file, "\n%s functions: %s\n",
report_filterlist.whitelist ? "whitelisted" : "blacklisted",
report_filterlist.used == 0 ? "none" : "");
for (i = 0; i < report_filterlist.used; ++i)
seq_printf(file, " %ps\n", (void *)report_filterlist.addrs[i]);
spin_unlock_irqrestore(&report_filterlist_lock, flags);
return 0;
}
static int debugfs_open(struct inode *inode, struct file *file)
{
return single_open(file, show_info, NULL);
}
static ssize_t debugfs_write(struct file *file, const char __user *buf,
size_t count, loff_t *off)
{
char kbuf[KSYM_NAME_LEN];
char *arg;
int read_len = count < (sizeof(kbuf) - 1) ? count : (sizeof(kbuf) - 1);
if (copy_from_user(kbuf, buf, read_len))
return -EFAULT;
kbuf[read_len] = '\0';
arg = strstrip(kbuf);
if (!strcmp(arg, "on")) {
WRITE_ONCE(kcsan_enabled, true);
} else if (!strcmp(arg, "off")) {
WRITE_ONCE(kcsan_enabled, false);
} else if (!strncmp(arg, "microbench=", sizeof("microbench=") - 1)) {
unsigned long iters;
if (kstrtoul(&arg[sizeof("microbench=") - 1], 0, &iters))
return -EINVAL;
microbenchmark(iters);
} else if (!strcmp(arg, "whitelist")) {
set_report_filterlist_whitelist(true);
} else if (!strcmp(arg, "blacklist")) {
set_report_filterlist_whitelist(false);
} else if (arg[0] == '!') {
ssize_t ret = insert_report_filterlist(&arg[1]);
if (ret < 0)
return ret;
} else {
return -EINVAL;
}
return count;
}
static const struct file_operations debugfs_ops = { .read = seq_read,
.open = debugfs_open,
.write = debugfs_write,
.release = single_release };
void __init kcsan_debugfs_init(void)
{
debugfs_create_file("kcsan", 0644, NULL, NULL, &debugfs_ops);
}

94
kernel/kcsan/encoding.h Normal file
View File

@ -0,0 +1,94 @@
/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _KERNEL_KCSAN_ENCODING_H
#define _KERNEL_KCSAN_ENCODING_H
#include <linux/bits.h>
#include <linux/log2.h>
#include <linux/mm.h>
#include "kcsan.h"
#define SLOT_RANGE PAGE_SIZE
#define INVALID_WATCHPOINT 0
#define CONSUMED_WATCHPOINT 1
/*
* The maximum useful size of accesses for which we set up watchpoints is the
* max range of slots we check on an access.
*/
#define MAX_ENCODABLE_SIZE (SLOT_RANGE * (1 + KCSAN_CHECK_ADJACENT))
/*
* Number of bits we use to store size info.
*/
#define WATCHPOINT_SIZE_BITS bits_per(MAX_ENCODABLE_SIZE)
/*
* This encoding for addresses discards the upper (1 for is-write + SIZE_BITS);
* however, most 64-bit architectures do not use the full 64-bit address space.
* Also, in order for a false positive to be observable 2 things need to happen:
*
* 1. different addresses but with the same encoded address race;
* 2. and both map onto the same watchpoint slots;
*
* Both these are assumed to be very unlikely. However, in case it still happens
* happens, the report logic will filter out the false positive (see report.c).
*/
#define WATCHPOINT_ADDR_BITS (BITS_PER_LONG - 1 - WATCHPOINT_SIZE_BITS)
/*
* Masks to set/retrieve the encoded data.
*/
#define WATCHPOINT_WRITE_MASK BIT(BITS_PER_LONG - 1)
#define WATCHPOINT_SIZE_MASK \
GENMASK(BITS_PER_LONG - 2, BITS_PER_LONG - 2 - WATCHPOINT_SIZE_BITS)
#define WATCHPOINT_ADDR_MASK \
GENMASK(BITS_PER_LONG - 3 - WATCHPOINT_SIZE_BITS, 0)
static inline bool check_encodable(unsigned long addr, size_t size)
{
return size <= MAX_ENCODABLE_SIZE;
}
static inline long encode_watchpoint(unsigned long addr, size_t size,
bool is_write)
{
return (long)((is_write ? WATCHPOINT_WRITE_MASK : 0) |
(size << WATCHPOINT_ADDR_BITS) |
(addr & WATCHPOINT_ADDR_MASK));
}
static inline bool decode_watchpoint(long watchpoint,
unsigned long *addr_masked, size_t *size,
bool *is_write)
{
if (watchpoint == INVALID_WATCHPOINT ||
watchpoint == CONSUMED_WATCHPOINT)
return false;
*addr_masked = (unsigned long)watchpoint & WATCHPOINT_ADDR_MASK;
*size = ((unsigned long)watchpoint & WATCHPOINT_SIZE_MASK) >>
WATCHPOINT_ADDR_BITS;
*is_write = !!((unsigned long)watchpoint & WATCHPOINT_WRITE_MASK);
return true;
}
/*
* Return watchpoint slot for an address.
*/
static inline int watchpoint_slot(unsigned long addr)
{
return (addr / PAGE_SIZE) % CONFIG_KCSAN_NUM_WATCHPOINTS;
}
static inline bool matching_access(unsigned long addr1, size_t size1,
unsigned long addr2, size_t size2)
{
unsigned long end_range1 = addr1 + size1 - 1;
unsigned long end_range2 = addr2 + size2 - 1;
return addr1 <= end_range2 && addr2 <= end_range1;
}
#endif /* _KERNEL_KCSAN_ENCODING_H */

108
kernel/kcsan/kcsan.h Normal file
View File

@ -0,0 +1,108 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* The Kernel Concurrency Sanitizer (KCSAN) infrastructure. For more info please
* see Documentation/dev-tools/kcsan.rst.
*/
#ifndef _KERNEL_KCSAN_KCSAN_H
#define _KERNEL_KCSAN_KCSAN_H
#include <linux/kcsan.h>
/* The number of adjacent watchpoints to check. */
#define KCSAN_CHECK_ADJACENT 1
/*
* Globally enable and disable KCSAN.
*/
extern bool kcsan_enabled;
/*
* Initialize debugfs file.
*/
void kcsan_debugfs_init(void);
enum kcsan_counter_id {
/*
* Number of watchpoints currently in use.
*/
KCSAN_COUNTER_USED_WATCHPOINTS,
/*
* Total number of watchpoints set up.
*/
KCSAN_COUNTER_SETUP_WATCHPOINTS,
/*
* Total number of data races.
*/
KCSAN_COUNTER_DATA_RACES,
/*
* Number of times no watchpoints were available.
*/
KCSAN_COUNTER_NO_CAPACITY,
/*
* A thread checking a watchpoint raced with another checking thread;
* only one will be reported.
*/
KCSAN_COUNTER_REPORT_RACES,
/*
* Observed data value change, but writer thread unknown.
*/
KCSAN_COUNTER_RACES_UNKNOWN_ORIGIN,
/*
* The access cannot be encoded to a valid watchpoint.
*/
KCSAN_COUNTER_UNENCODABLE_ACCESSES,
/*
* Watchpoint encoding caused a watchpoint to fire on mismatching
* accesses.
*/
KCSAN_COUNTER_ENCODING_FALSE_POSITIVES,
KCSAN_COUNTER_COUNT, /* number of counters */
};
/*
* Increment/decrement counter with given id; avoid calling these in fast-path.
*/
void kcsan_counter_inc(enum kcsan_counter_id id);
void kcsan_counter_dec(enum kcsan_counter_id id);
/*
* Returns true if data races in the function symbol that maps to func_addr
* (offsets are ignored) should *not* be reported.
*/
bool kcsan_skip_report_debugfs(unsigned long func_addr);
enum kcsan_report_type {
/*
* The thread that set up the watchpoint and briefly stalled was
* signalled that another thread triggered the watchpoint.
*/
KCSAN_REPORT_RACE_SIGNAL,
/*
* A thread found and consumed a matching watchpoint.
*/
KCSAN_REPORT_CONSUMED_WATCHPOINT,
/*
* No other thread was observed to race with the access, but the data
* value before and after the stall differs.
*/
KCSAN_REPORT_RACE_UNKNOWN_ORIGIN,
};
/*
* Print a race report from thread that encountered the race.
*/
void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
bool value_change, int cpu_id, enum kcsan_report_type type);
#endif /* _KERNEL_KCSAN_KCSAN_H */

320
kernel/kcsan/report.c Normal file
View File

@ -0,0 +1,320 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/preempt.h>
#include <linux/printk.h>
#include <linux/sched.h>
#include <linux/spinlock.h>
#include <linux/stacktrace.h>
#include "kcsan.h"
#include "encoding.h"
/*
* Max. number of stack entries to show in the report.
*/
#define NUM_STACK_ENTRIES 64
/*
* Other thread info: communicated from other racing thread to thread that set
* up the watchpoint, which then prints the complete report atomically. Only
* need one struct, as all threads should to be serialized regardless to print
* the reports, with reporting being in the slow-path.
*/
static struct {
const volatile void *ptr;
size_t size;
bool is_write;
int task_pid;
int cpu_id;
unsigned long stack_entries[NUM_STACK_ENTRIES];
int num_stack_entries;
} other_info = { .ptr = NULL };
/*
* This spinlock protects reporting and other_info, since other_info is usually
* required when reporting.
*/
static DEFINE_SPINLOCK(report_lock);
/*
* Special rules to skip reporting.
*/
static bool skip_report(bool is_write, bool value_change,
unsigned long top_frame)
{
if (IS_ENABLED(CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY) && is_write &&
!value_change) {
/*
* The access is a write, but the data value did not change.
*
* We opt-out of this filter for certain functions at request of
* maintainers.
*/
char buf[64];
snprintf(buf, sizeof(buf), "%ps", (void *)top_frame);
if (!strnstr(buf, "rcu_", sizeof(buf)) &&
!strnstr(buf, "_rcu", sizeof(buf)) &&
!strnstr(buf, "_srcu", sizeof(buf)))
return true;
}
return kcsan_skip_report_debugfs(top_frame);
}
static inline const char *get_access_type(bool is_write)
{
return is_write ? "write" : "read";
}
/* Return thread description: in task or interrupt. */
static const char *get_thread_desc(int task_id)
{
if (task_id != -1) {
static char buf[32]; /* safe: protected by report_lock */
snprintf(buf, sizeof(buf), "task %i", task_id);
return buf;
}
return "interrupt";
}
/* Helper to skip KCSAN-related functions in stack-trace. */
static int get_stack_skipnr(unsigned long stack_entries[], int num_entries)
{
char buf[64];
int skip = 0;
for (; skip < num_entries; ++skip) {
snprintf(buf, sizeof(buf), "%ps", (void *)stack_entries[skip]);
if (!strnstr(buf, "csan_", sizeof(buf)) &&
!strnstr(buf, "tsan_", sizeof(buf)) &&
!strnstr(buf, "_once_size", sizeof(buf))) {
break;
}
}
return skip;
}
/* Compares symbolized strings of addr1 and addr2. */
static int sym_strcmp(void *addr1, void *addr2)
{
char buf1[64];
char buf2[64];
snprintf(buf1, sizeof(buf1), "%pS", addr1);
snprintf(buf2, sizeof(buf2), "%pS", addr2);
return strncmp(buf1, buf2, sizeof(buf1));
}
/*
* Returns true if a report was generated, false otherwise.
*/
static bool print_report(const volatile void *ptr, size_t size, bool is_write,
bool value_change, int cpu_id,
enum kcsan_report_type type)
{
unsigned long stack_entries[NUM_STACK_ENTRIES] = { 0 };
int num_stack_entries =
stack_trace_save(stack_entries, NUM_STACK_ENTRIES, 1);
int skipnr = get_stack_skipnr(stack_entries, num_stack_entries);
int other_skipnr;
/*
* Must check report filter rules before starting to print.
*/
if (skip_report(is_write, true, stack_entries[skipnr]))
return false;
if (type == KCSAN_REPORT_RACE_SIGNAL) {
other_skipnr = get_stack_skipnr(other_info.stack_entries,
other_info.num_stack_entries);
/* value_change is only known for the other thread */
if (skip_report(other_info.is_write, value_change,
other_info.stack_entries[other_skipnr]))
return false;
}
/* Print report header. */
pr_err("==================================================================\n");
switch (type) {
case KCSAN_REPORT_RACE_SIGNAL: {
void *this_fn = (void *)stack_entries[skipnr];
void *other_fn = (void *)other_info.stack_entries[other_skipnr];
int cmp;
/*
* Order functions lexographically for consistent bug titles.
* Do not print offset of functions to keep title short.
*/
cmp = sym_strcmp(other_fn, this_fn);
pr_err("BUG: KCSAN: data-race in %ps / %ps\n",
cmp < 0 ? other_fn : this_fn,
cmp < 0 ? this_fn : other_fn);
} break;
case KCSAN_REPORT_RACE_UNKNOWN_ORIGIN:
pr_err("BUG: KCSAN: data-race in %pS\n",
(void *)stack_entries[skipnr]);
break;
default:
BUG();
}
pr_err("\n");
/* Print information about the racing accesses. */
switch (type) {
case KCSAN_REPORT_RACE_SIGNAL:
pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
get_access_type(other_info.is_write), other_info.ptr,
other_info.size, get_thread_desc(other_info.task_pid),
other_info.cpu_id);
/* Print the other thread's stack trace. */
stack_trace_print(other_info.stack_entries + other_skipnr,
other_info.num_stack_entries - other_skipnr,
0);
pr_err("\n");
pr_err("%s to 0x%px of %zu bytes by %s on cpu %i:\n",
get_access_type(is_write), ptr, size,
get_thread_desc(in_task() ? task_pid_nr(current) : -1),
cpu_id);
break;
case KCSAN_REPORT_RACE_UNKNOWN_ORIGIN:
pr_err("race at unknown origin, with %s to 0x%px of %zu bytes by %s on cpu %i:\n",
get_access_type(is_write), ptr, size,
get_thread_desc(in_task() ? task_pid_nr(current) : -1),
cpu_id);
break;
default:
BUG();
}
/* Print stack trace of this thread. */
stack_trace_print(stack_entries + skipnr, num_stack_entries - skipnr,
0);
/* Print report footer. */
pr_err("\n");
pr_err("Reported by Kernel Concurrency Sanitizer on:\n");
dump_stack_print_info(KERN_DEFAULT);
pr_err("==================================================================\n");
return true;
}
static void release_report(unsigned long *flags, enum kcsan_report_type type)
{
if (type == KCSAN_REPORT_RACE_SIGNAL)
other_info.ptr = NULL; /* mark for reuse */
spin_unlock_irqrestore(&report_lock, *flags);
}
/*
* Depending on the report type either sets other_info and returns false, or
* acquires the matching other_info and returns true. If other_info is not
* required for the report type, simply acquires report_lock and returns true.
*/
static bool prepare_report(unsigned long *flags, const volatile void *ptr,
size_t size, bool is_write, int cpu_id,
enum kcsan_report_type type)
{
if (type != KCSAN_REPORT_CONSUMED_WATCHPOINT &&
type != KCSAN_REPORT_RACE_SIGNAL) {
/* other_info not required; just acquire report_lock */
spin_lock_irqsave(&report_lock, *flags);
return true;
}
retry:
spin_lock_irqsave(&report_lock, *flags);
switch (type) {
case KCSAN_REPORT_CONSUMED_WATCHPOINT:
if (other_info.ptr != NULL)
break; /* still in use, retry */
other_info.ptr = ptr;
other_info.size = size;
other_info.is_write = is_write;
other_info.task_pid = in_task() ? task_pid_nr(current) : -1;
other_info.cpu_id = cpu_id;
other_info.num_stack_entries = stack_trace_save(
other_info.stack_entries, NUM_STACK_ENTRIES, 1);
spin_unlock_irqrestore(&report_lock, *flags);
/*
* The other thread will print the summary; other_info may now
* be consumed.
*/
return false;
case KCSAN_REPORT_RACE_SIGNAL:
if (other_info.ptr == NULL)
break; /* no data available yet, retry */
/*
* First check if this is the other_info we are expecting, i.e.
* matches based on how watchpoint was encoded.
*/
if (!matching_access((unsigned long)other_info.ptr &
WATCHPOINT_ADDR_MASK,
other_info.size,
(unsigned long)ptr & WATCHPOINT_ADDR_MASK,
size))
break; /* mismatching watchpoint, retry */
if (!matching_access((unsigned long)other_info.ptr,
other_info.size, (unsigned long)ptr,
size)) {
/*
* If the actual accesses to not match, this was a false
* positive due to watchpoint encoding.
*/
kcsan_counter_inc(
KCSAN_COUNTER_ENCODING_FALSE_POSITIVES);
/* discard this other_info */
release_report(flags, KCSAN_REPORT_RACE_SIGNAL);
return false;
}
/*
* Matching & usable access in other_info: keep other_info_lock
* locked, as this thread consumes it to print the full report;
* unlocked in release_report.
*/
return true;
default:
BUG();
}
spin_unlock_irqrestore(&report_lock, *flags);
goto retry;
}
void kcsan_report(const volatile void *ptr, size_t size, bool is_write,
bool value_change, int cpu_id, enum kcsan_report_type type)
{
unsigned long flags = 0;
kcsan_disable_current();
if (prepare_report(&flags, ptr, size, is_write, cpu_id, type)) {
if (print_report(ptr, size, is_write, value_change, cpu_id,
type) &&
panic_on_warn)
panic("panic_on_warn set ...\n");
release_report(&flags, type);
}
kcsan_enable_current();
}

121
kernel/kcsan/test.c Normal file
View File

@ -0,0 +1,121 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/printk.h>
#include <linux/random.h>
#include <linux/types.h>
#include "encoding.h"
#define ITERS_PER_TEST 2000
/* Test requirements. */
static bool test_requires(void)
{
/* random should be initialized for the below tests */
return prandom_u32() + prandom_u32() != 0;
}
/*
* Test watchpoint encode and decode: check that encoding some access's info,
* and then subsequent decode preserves the access's info.
*/
static bool test_encode_decode(void)
{
int i;
for (i = 0; i < ITERS_PER_TEST; ++i) {
size_t size = prandom_u32_max(MAX_ENCODABLE_SIZE) + 1;
bool is_write = !!prandom_u32_max(2);
unsigned long addr;
prandom_bytes(&addr, sizeof(addr));
if (WARN_ON(!check_encodable(addr, size)))
return false;
/* encode and decode */
{
const long encoded_watchpoint =
encode_watchpoint(addr, size, is_write);
unsigned long verif_masked_addr;
size_t verif_size;
bool verif_is_write;
/* check special watchpoints */
if (WARN_ON(decode_watchpoint(
INVALID_WATCHPOINT, &verif_masked_addr,
&verif_size, &verif_is_write)))
return false;
if (WARN_ON(decode_watchpoint(
CONSUMED_WATCHPOINT, &verif_masked_addr,
&verif_size, &verif_is_write)))
return false;
/* check decoding watchpoint returns same data */
if (WARN_ON(!decode_watchpoint(
encoded_watchpoint, &verif_masked_addr,
&verif_size, &verif_is_write)))
return false;
if (WARN_ON(verif_masked_addr !=
(addr & WATCHPOINT_ADDR_MASK)))
goto fail;
if (WARN_ON(verif_size != size))
goto fail;
if (WARN_ON(is_write != verif_is_write))
goto fail;
continue;
fail:
pr_err("%s fail: %s %zu bytes @ %lx -> encoded: %lx -> %s %zu bytes @ %lx\n",
__func__, is_write ? "write" : "read", size,
addr, encoded_watchpoint,
verif_is_write ? "write" : "read", verif_size,
verif_masked_addr);
return false;
}
}
return true;
}
/* Test access matching function. */
static bool test_matching_access(void)
{
if (WARN_ON(!matching_access(10, 1, 10, 1)))
return false;
if (WARN_ON(!matching_access(10, 2, 11, 1)))
return false;
if (WARN_ON(!matching_access(10, 1, 9, 2)))
return false;
if (WARN_ON(matching_access(10, 1, 11, 1)))
return false;
if (WARN_ON(matching_access(9, 1, 10, 1)))
return false;
return true;
}
static int __init kcsan_selftest(void)
{
int passed = 0;
int total = 0;
#define RUN_TEST(do_test) \
do { \
++total; \
if (do_test()) \
++passed; \
else \
pr_err("KCSAN selftest: " #do_test " failed"); \
} while (0)
RUN_TEST(test_requires);
RUN_TEST(test_encode_decode);
RUN_TEST(test_matching_access);
pr_info("KCSAN selftest: %d/%d tests passed\n", passed, total);
if (passed != total)
panic("KCSAN selftests failed");
return 0;
}
postcore_initcall(kcsan_selftest);

View File

@ -7,6 +7,12 @@ endif
# that is not a function of syscall inputs. E.g. involuntary context switches.
KCOV_INSTRUMENT := n
# There are numerous races here, however, most of them due to plain accesses.
# This would make it even harder for syzbot to find reproducers, because these
# bugs trigger without specific input. Disable by default, but should re-enable
# eventually.
KCSAN_SANITIZE := n
ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y)
# According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
# needed for x86 only. Why this used to be enabled for all architectures is beyond

View File

@ -2086,6 +2086,8 @@ source "lib/Kconfig.kgdb"
source "lib/Kconfig.ubsan"
source "lib/Kconfig.kcsan"
config ARCH_HAS_DEVMEM_IS_ALLOWED
bool

118
lib/Kconfig.kcsan Normal file
View File

@ -0,0 +1,118 @@
# SPDX-License-Identifier: GPL-2.0-only
config HAVE_ARCH_KCSAN
bool
menuconfig KCSAN
bool "KCSAN: watchpoint-based dynamic data race detector"
depends on HAVE_ARCH_KCSAN && !KASAN && STACKTRACE
default n
help
Kernel Concurrency Sanitizer is a dynamic data race detector, which
uses a watchpoint-based sampling approach to detect races. See
<file:Documentation/dev-tools/kcsan.rst> for more details.
if KCSAN
config KCSAN_DEBUG
bool "Debugging of KCSAN internals"
default n
config KCSAN_SELFTEST
bool "Perform short selftests on boot"
default y
help
Run KCSAN selftests on boot. On test failure, causes kernel to panic.
config KCSAN_EARLY_ENABLE
bool "Early enable during boot"
default y
help
If KCSAN should be enabled globally as soon as possible. KCSAN can
later be enabled/disabled via debugfs.
config KCSAN_NUM_WATCHPOINTS
int "Number of available watchpoints"
default 64
help
Total number of available watchpoints. An address range maps into a
specific watchpoint slot as specified in kernel/kcsan/encoding.h.
Although larger number of watchpoints may not be usable due to
limited number of CPUs, a larger value helps to improve performance
due to reducing cache-line contention. The chosen default is a
conservative value; we should almost never observe "no_capacity"
events (see /sys/kernel/debug/kcsan).
config KCSAN_UDELAY_TASK
int "Delay in microseconds (for tasks)"
default 80
help
For tasks, the microsecond delay after setting up a watchpoint.
config KCSAN_UDELAY_INTERRUPT
int "Delay in microseconds (for interrupts)"
default 20
help
For interrupts, the microsecond delay after setting up a watchpoint.
Interrupts have tighter latency requirements, and their delay should
be lower than for tasks.
config KCSAN_DELAY_RANDOMIZE
bool "Randomize above delays"
default y
help
If delays should be randomized, where the maximum is KCSAN_UDELAY_*.
If false, the chosen delays are always KCSAN_UDELAY_* defined above.
config KCSAN_SKIP_WATCH
int "Skip instructions before setting up watchpoint"
default 4000
help
The number of per-CPU memory operations to skip, before another
watchpoint is set up, i.e. one in KCSAN_WATCH_SKIP per-CPU
memory operations are used to set up a watchpoint. A smaller value
results in more aggressive race detection, whereas a larger value
improves system performance at the cost of missing some races.
config KCSAN_SKIP_WATCH_RANDOMIZE
bool "Randomize watchpoint instruction skip count"
default y
help
If instruction skip count should be randomized, where the maximum is
KCSAN_WATCH_SKIP. If false, the chosen value is always
KCSAN_WATCH_SKIP.
# Note that, while some of the below options could be turned into boot
# parameters, to optimize for the common use-case, we avoid this because: (a)
# it would impact performance (and we want to avoid static branch for all
# {READ,WRITE}_ONCE, atomic_*, bitops, etc.), and (b) complicate the design
# without real benefit. The main purpose of the below options are for use in
# fuzzer configs to control reported data races, and are not expected to be
# switched frequently by a user.
config KCSAN_REPORT_RACE_UNKNOWN_ORIGIN
bool "Report races of unknown origin"
default y
help
If KCSAN should report races where only one access is known, and the
conflicting access is of unknown origin. This type of race is
reported if it was only possible to infer a race due to a data value
change while an access is being delayed on a watchpoint.
config KCSAN_REPORT_VALUE_CHANGE_ONLY
bool "Only report races where watcher observed a data value change"
default y
help
If enabled and a conflicting write is observed via watchpoint, but
the data value of the memory location was observed to remain
unchanged, do not report the data race.
config KCSAN_IGNORE_ATOMICS
bool "Do not instrument marked atomic accesses"
default n
help
If enabled, never instruments marked atomic accesses. This results in
not reporting data races where one access is atomic and the other is
a plain access.
endif # KCSAN

View File

@ -24,6 +24,9 @@ KASAN_SANITIZE_string.o := n
CFLAGS_string.o := $(call cc-option, -fno-stack-protector)
endif
# Used by KCSAN while enabled, avoid recursion.
KCSAN_SANITIZE_random32.o := n
lib-y := ctype.o string.o vsprintf.o cmdline.o \
rbtree.o radix-tree.o timerqueue.o xarray.o \
idr.o extable.o \

View File

@ -7,6 +7,14 @@ KASAN_SANITIZE_slab_common.o := n
KASAN_SANITIZE_slab.o := n
KASAN_SANITIZE_slub.o := n
# These produce frequent data race reports: most of them are due to races on
# the same word but accesses to different bits of that word. Re-enable KCSAN
# for these when we have more consensus on what to do about them.
KCSAN_SANITIZE_slab_common.o := n
KCSAN_SANITIZE_slab.o := n
KCSAN_SANITIZE_slub.o := n
KCSAN_SANITIZE_page_alloc.o := n
# These files are disabled because they produce non-interesting and/or
# flaky coverage that is not a function of syscall inputs. E.g. slab is out of
# free pages, or a task is migrated between nodes.

6
scripts/Makefile.kcsan Normal file
View File

@ -0,0 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
ifdef CONFIG_KCSAN
CFLAGS_KCSAN := -fsanitize=thread
endif # CONFIG_KCSAN

View File

@ -152,6 +152,16 @@ _c_flags += $(if $(patsubst n%,, \
$(CFLAGS_KCOV))
endif
#
# Enable KCSAN flags except some files or directories we don't want to check
# (depends on variables KCSAN_SANITIZE_obj.o, KCSAN_SANITIZE)
#
ifeq ($(CONFIG_KCSAN),y)
_c_flags += $(if $(patsubst n%,, \
$(KCSAN_SANITIZE_$(basetarget).o)$(KCSAN_SANITIZE)y), \
$(CFLAGS_KCSAN))
endif
# $(srctree)/$(src) for including checkin headers from generated source files
# $(objtree)/$(obj) for including generated headers from checkin source files
ifeq ($(KBUILD_EXTMOD),)

View File

@ -20,7 +20,7 @@ gen_param_check()
# We don't write to constant parameters
[ ${type#c} != ${type} ] && rw="read"
printf "\tkasan_check_${rw}(${name}, sizeof(*${name}));\n"
printf "\t__atomic_check_${rw}(${name}, sizeof(*${name}));\n"
}
#gen_param_check(arg...)
@ -107,7 +107,7 @@ cat <<EOF
#define ${xchg}(ptr, ...) \\
({ \\
typeof(ptr) __ai_ptr = (ptr); \\
kasan_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
__atomic_check_write(__ai_ptr, ${mult}sizeof(*__ai_ptr)); \\
arch_${xchg}(__ai_ptr, __VA_ARGS__); \\
})
EOF
@ -148,6 +148,19 @@ cat << EOF
#include <linux/build_bug.h>
#include <linux/kasan-checks.h>
#include <linux/kcsan-checks.h>
static inline void __atomic_check_read(const volatile void *v, size_t size)
{
kasan_check_read(v, size);
kcsan_check_atomic_read(v, size);
}
static inline void __atomic_check_write(const volatile void *v, size_t size)
{
kasan_check_write(v, size);
kcsan_check_atomic_write(v, size);
}
EOF

View File

@ -466,6 +466,24 @@ static const char *uaccess_safe_builtin[] = {
"__asan_report_store4_noabort",
"__asan_report_store8_noabort",
"__asan_report_store16_noabort",
/* KCSAN */
"kcsan_found_watchpoint",
"kcsan_setup_watchpoint",
/* KCSAN/TSAN */
"__tsan_func_entry",
"__tsan_func_exit",
"__tsan_read_range",
"__tsan_write_range",
"__tsan_read1",
"__tsan_read2",
"__tsan_read4",
"__tsan_read8",
"__tsan_read16",
"__tsan_write1",
"__tsan_write2",
"__tsan_write4",
"__tsan_write8",
"__tsan_write16",
/* KCOV */
"write_comp_data",
"__sanitizer_cov_trace_pc",