Commit Graph

889 Commits

Author SHA1 Message Date
Paul E. McKenney
609af1cdf0 Merge branches 'expedited.2018.07.12a', 'fixes.2018.07.12a', 'srcu.2018.06.25b' and 'torture.2018.06.25b' into HEAD
expedited.2018.07.12a: Expedited grace-period updates.
fixes.2018.07.12a: Pre-gp_seq miscellaneous fixes.
srcu.2018.06.25b: SRCU updates.
torture.2018.06.25b: Pre-gp_seq torture-test updates.
2018-07-12 14:26:14 -07:00
Paul E. McKenney
18390aeae7 rcu: Make rcu_gp_cleanup() write only once to ->gp_flags
At the end of rcu_gp_cleanup(), if another grace period is needed, but
not via rcu_accelerate_cbs(), the ->gp_flags field is written twice,
once when making the new grace-period request, and once when clearing
all other types of requests.  This commit therefore adds an else-clause
to avoid this double write.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-07-12 14:25:17 -07:00
Paul E. McKenney
26d950a945 rcu: Diagnostics for grace-period startup hangs
This commit causes a splat if RCU is idle and a request for a new grace
period is ignored for more than one second.  This splat normally indicates
that some code path asked for a new grace period, but failed to wake up
the RCU grace-period kthread.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix bug located by Dan Carpenter and his static checker. ]
[ paulmck: Fix self-deadlock bug located 0day test robot. ]
[ paulmck: Disable unless CONFIG_PROVE_RCU=y. ]
2018-07-12 14:24:42 -07:00
Boqun Feng
fcc6354365 rcu: Make expedited GPs handle CPU 0 being offline
Currently, the parallelized initialization of expedited grace periods uses
the workqueue associated with each rcu_node structure's ->grplo field.
This works fine unless that CPU is offline.  This commit therefore uses
the CPU corresponding to the lowest-numbered online CPU, or just queues
the work on WORK_CPU_UNBOUND if there are no online CPUs corresponding
to this rcu_node structure.

Note that this patch uses cpu_is_offline() instead of the usual approach
of checking bits in the rcu_node structure's ->qsmaskinitnext field.  This
is safe because preemption is disabled across both the cpu_is_offline()
check and the call to queue_work_on().

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
[ paulmck: Disable preemption to close offline race window. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply Peter Zijlstra feedback on CPU selection. ]
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2018-07-12 12:36:06 -07:00
Paul E. McKenney
8c42b1f39f rcu: Exclude near-simultaneous RCU CPU stall warnings
There is a two-jiffy delay between the time that a CPU will self-report
an RCU CPU stall warning and the time that some other CPU will report a
warning on behalf of the first CPU.  This has worked well in the past,
but on busy systems, it is possible for the two warnings to overlap,
which makes interpreting them extremely difficult.

This commit therefore uses a cmpxchg-based timing decision that
allows only one report in a given one-minute period (assuming default
stall-warning Kconfig parameters).  This approach will of course fail
if you are seeing minute-long vCPU preemption, but in that case the
overlapping RCU CPU stall warnings are the least of your worries.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:56 -07:00
Boqun Feng
ce11fae8d4 rcu: Use the proper lockdep annotation in dump_blkd_tasks()
Sparse reported this:

| kernel/rcu/tree_plugin.h:814:9: warning: incorrect type in argument 1 (different modifiers)
| kernel/rcu/tree_plugin.h:814:9:    expected struct lockdep_map const *lock
| kernel/rcu/tree_plugin.h:814:9:    got struct lockdep_map [noderef] *<noident>

This is caused by using vanilla lockdep annotations on rcu_node::lock,
and that requires accessing ->lock of rcu_node directly. However we need
to keep rcu_node::lock __private to avoid breaking its extra ordering
guarantee. And we have a dedicated lockdep annotation for
rcu_node::lock, so use it.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Paul E. McKenney
4bc8d55574 rcu: Add debugging info to assertion
The WARN_ON_ONCE(rcu_preempt_blocked_readers_cgp()) in
rcu_gp_cleanup() triggers (inexplicably, of course) every so often.
This commit therefore extracts more information.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-26 12:25:55 -07:00
Paul E. McKenney
6050003763 torture: Keep old-school dmesg format
This commit adds "#define pr_fmt(fmt) fmt" to the torture-test files
in order to keep the current dmesg format.  Once Joe's commits have
hit mainline, these definitions will be changed in order to automatically
generate the dmesg line prefix that the scripts expect.  This will have
the beneficial side-effect of allowing printk() formats to be used more
widely and of shortening some pr_*() lines.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Joe Perches <joe@perches.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
90127d605f torture: Make online/offline messages appear only for verbose=2
Some bugs reproduce quickly only at high CPU-hotplug rates, so the
rcutorture TREE03 scenario now has only 200 milliseconds spacing between
CPU-hotplug operations.  At this rate, the torture-test pair of console
messages per operation becomes a bit voluminous.  This commit therefore
converts the torture-test set of "verbose" kernel-boot arguments from
bool to int, and prints the extra console messages only when verbose=2.
The default is still verbose=1.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:30:10 -07:00
Paul E. McKenney
5ab07a8df4 srcu: Add address of first callback to rcutorture output
This commit adds the address of the first callback to the per-CPU rcutorture
output in order to allow lost wakeups to be more efficiently tracked down.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
17294ce6a4 srcu: Document that srcu_funnel_gp_start() implies srcu_funnel_exp_start()
This commit updates the header comment of srcu_funnel_gp_start() to
document the fact that srcu_funnel_gp_start() does the work of
srcu_funnel_exp_start(), in some cases by invoking it directly.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5ef98a6328 srcu: Fix typos in __call_srcu() header comment
This commit simply changes some copy-pasta call_rcu() instances to
the correct call_srcu().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:26:24 -07:00
Paul E. McKenney
5257514d88 rcu: Make expedited grace period use direct call on last leaf
During expedited grace-period initialization, a work item is scheduled
for each leaf rcu_node structure.  However, that initialization code
is itself (normally) executing from a workqueue, so one of the leaf
rcu_node structures could just as well be handled by that pre-existing
workqueue, and with less overhead.  This commit therefore uses a
shiny new rcu_is_leaf_node() macro to execute the last leaf rcu_node
structure's initialization directly from the pre-existing workqueue.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-06-25 11:25:41 -07:00
Kees Cook
42bc47b353 treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vmalloc(a * b)

with:
        vmalloc(array_size(a, b))

as well as handling cases of:

        vmalloc(a * b * c)

with:

        vmalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vmalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vmalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vmalloc(C1 * C2 * C3, ...)
|
  vmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vmalloc(C1 * C2, ...)
|
  vmalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Peter Zijlstra
f64c6013a2 rcu/x86: Provide early rcu_cpu_starting() callback
The x86/mtrr code does horrific things because hardware. It uses
stop_machine_from_inactive_cpu(), which does a wakeup (of the stopper
thread on another CPU), which uses RCU, all before the CPU is onlined.

RCU complains about this, because wakeups use RCU and RCU does
(rightfully) not consider offline CPUs for grace-periods.

Fix this by initializing RCU way early in the MTRR case.

Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Add !SMP support, per 0day Test Robot report. ]
2018-05-22 16:12:26 -07:00
Paul E. McKenney
22df7316ac Merge branches 'exp.2018.05.15a', 'fixes.2018.05.15a', 'lock.2018.05.15a' and 'torture.2018.05.15a' into HEAD
exp.2018.05.15a: Parallelize expedited grace-period initialization.
fixes.2018.05.15a: Miscellaneous fixes.
lock.2018.05.15a: Decrease lock contention on root rcu_node structure,
	which is a step towards merging RCU flavors.
torture.2018.05.15a: Torture-test updates.
2018-05-15 10:33:05 -07:00
Paul E. McKenney
034777d7f5 rcutorture: Print end-of-test state
This commit adds end-of-test state printout to help check whether RCU
shut down nicely.  Note that this printout only helps for flavors of
RCU that are not used much by the kernel.  In particular, for normal
RCU having a grace period in progress is expected behavior.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:32:08 -07:00
Paul E. McKenney
a458360af6 rcu: Drop early GP request check from rcu_gp_kthread()
Now that grace-period requests use funnel locking and now that they
set ->gp_flags to RCU_GP_FLAG_INIT even when the RCU grace-period
kthread has not yet started, rcu_gp_kthread() no longer needs to check
need_any_future_gp() at startup time.  This commit therefore removes
this check.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:31:04 -07:00
Paul E. McKenney
c1935209df rcu: Simplify and inline cpu_needs_another_gp()
Now that RCU no longer relies on failsafe checks, cpu_needs_another_gp()
can be greatly simplified.  This simplification eliminates the last
call to rcu_future_needs_gp() and to rcu_segcblist_future_gp_needed(),
both of which which can then be eliminated.  And then, because
cpu_needs_another_gp() is called only from __rcu_pending(), it can be
inlined and eliminated.

This commit carries out the simplification, inlining, and elimination
called out above.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:59 -07:00
Paul E. McKenney
384f77f4cb rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
All of the cpu_needs_another_gp() function's checks (except for
newly arrived callbacks) have been subsumed into the rcu_gp_cleanup()
function's scan of the rcu_node tree.  This commit therefore drops the
call to cpu_needs_another_gp().  The check for newly arrived callbacks
is supplied by rcu_accelerate_cbs().  Any needed advancing (as in the
earlier rcu_advance_cbs() call) will be supplied when the corresponding
CPU becomes aware of the end of the now-completed grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:54 -07:00
Paul E. McKenney
665f08f1ce rcu: Make rcu_start_this_gp() check for out-of-range requests
If rcu_start_this_gp() is invoked with a requested grace period more
than three in the future, then either the ->need_future_gp[] array
needs to be bigger or the caller needs to be repaired.  This commit
therefore adds a WARN_ON_ONCE() checking for this condition.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:48 -07:00
Paul E. McKenney
360e0da67e rcu: Add funnel locking to rcu_start_this_gp()
The rcu_start_this_gp() function had a simple form of funnel locking that
used only the leaves and root of the rcu_node tree, which is fine for
systems with only a few hundred CPUs, but sub-optimal for systems having
thousands of CPUs.  This commit therefore adds full-tree funnel locking.

This variant of funnel locking is unusual in the following ways:

1.	The leaf-level rcu_node structure's ->lock is held throughout.
	Other funnel-locking implementations drop the leaf-level lock
	before progressing to the next level of the tree.

2.	Funnel locking can be started at the root, which is convenient
	for code that already holds the root rcu_node structure's ->lock.
	Other funnel-locking implementations start at the leaves.

3.	If an rcu_node structure other than the initial one believes
	that a grace period is in progress, it is not necessary to
	go further up the tree.  This is because grace-period cleanup
	scans the full tree, so that marking the need for a subsequent
	grace period anywhere in the tree suffices -- but only if
	a grace period is currently in progress.

4.	It is possible that the RCU grace-period kthread has not yet
	started, and this case must be handled appropriately.

However, the general approach of using a tree to control lock contention
is still in place.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:37 -07:00
Paul E. McKenney
41e80595ab rcu: Make rcu_start_future_gp() caller select grace period
The rcu_accelerate_cbs() function selects a grace-period target, which
it uses to have rcu_segcblist_accelerate() assign numbers to recently
queued callbacks.  Then it invokes rcu_start_future_gp(), which selects
a grace-period target again, which is a bit pointless.  This commit
therefore changes rcu_start_future_gp() to take the grace-period target as
a parameter, thus avoiding double selection.  This commit also changes
the name of rcu_start_future_gp() to rcu_start_this_gp() to reflect
this change in functionality, and also makes a similar change to the
name of trace_rcu_future_gp().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:32 -07:00
Paul E. McKenney
d5cd96851d rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()
The rcu_start_gp_advanced() is invoked only from rcu_start_future_gp() and
much of its code is redundant when invoked from that context.  This commit
therefore inlines rcu_start_gp_advanced() into rcu_start_future_gp(),
then removes rcu_start_gp_advanced().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:27 -07:00
Paul E. McKenney
a824a287f6 rcu: Clear request other than RCU_GP_FLAG_INIT at GP end
Once the grace period has ended, any RCU_GP_FLAG_FQS requests are
irrelevant:  The grace period has ended, so there is no longer any
point in forcing quiescent states in order to try to make it end sooner.
This commit therefore causes rcu_gp_cleanup() to clear any bits other
than RCU_GP_FLAG_INIT from ->gp_flags at the end of the grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:20 -07:00
Paul E. McKenney
a508aa597e rcu: Cleanup, don't put ->completed into an int
It is true that currently only the low-order two bits are used, so
there should be no problem given modern machines and compilers, but
good hygiene and maintainability dictates use of an unsigned long
instead of an int.  This commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:15 -07:00
Paul E. McKenney
bd7af8463b rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()
The __rcu_process_callbacks() function currently checks to see if
the current CPU needs a grace period and also if there is any other
reason to kick off a new grace period.  This is one of the fail-safe
checks that has been rendered unnecessary by the changes that increase
the accuracy of rcu_gp_cleanup()'s estimate as to whether another grace
period is required.  Because this particular fail-safe involved acquiring
the root rcu_node structure's ->lock, which has seen excessive contention
in real life, this fail-safe needs to go.

However, one check must remain, namely the check for newly arrived
RCU callbacks that have not yet been associated with a grace period.
One might hope that the checks in __note_gp_changes(), which is invoked
indirectly from rcu_check_quiescent_state(), would suffice, but this
function won't be invoked at all if RCU is idle.  It is therefore necessary
to replace the fail-safe checks with a simpler check for newly arrived
callbacks during an RCU idle period, which is exactly what this commit
does.  This change removes the final call to rcu_start_gp(), so this
function is removed as well.

Note that lockless use of cpu_needs_another_gp() is racy, but that
these races are harmless in this case.  If RCU really is idle, the
values will not change, so the return value from cpu_needs_another_gp()
will be correct.  If RCU is not idle, the resulting redundant call to
rcu_accelerate_cbs() will be harmless, and might even have the benefit
of reducing grace-period latency a bit.

This commit also moves interrupt disabling into the "if" statement to
improve real-time response a bit.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:30:03 -07:00
Paul E. McKenney
a6058d85a2 rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition
When __call_rcu_core() notices excessive numbers of callbacks pending
on the current CPU, we know that at least one of them is not yet
classified, namely the one that was just now queued.  Therefore, it
is not necessary to invoke rcu_start_gp() and thus not necessary to
acquire the root rcu_node structure's ->lock.  This commit therefore
replaces the rcu_start_gp() with rcu_accelerate_cbs(), thus replacing
an acquisition of the root rcu_node structure's ->lock with that of
this CPU's leaf rcu_node structure.

This decreases contention on the root rcu_node structure's ->lock.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:57 -07:00
Paul E. McKenney
ec4eaccef4 rcu: Make rcu_migrate_callbacks wake GP kthread when needed
The rcu_migrate_callbacks() function invokes rcu_advance_cbs()
twice, ignoring the return value.  This is OK at pressent because of
failsafe code that does the wakeup when needed.  However, this failsafe
code acquires the root rcu_node structure's lock frequently, while
rcu_migrate_callbacks() does so only once per CPU-offline operation.

This commit therefore makes rcu_migrate_callbacks()
wake up the RCU GP kthread when either call to rcu_advance_cbs()
returns true, thus removing need for the failsafe code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:51 -07:00
Paul E. McKenney
6f576e2816 rcu: Convert ->need_future_gp[] array to boolean
There is no longer any need for ->need_future_gp[] to count the number of
requests for future grace periods, so this commit converts the additions
to assignments to "true" and reduces the size of each element to one byte.
While we are in the area, fix an obsolete comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:46 -07:00
Paul E. McKenney
0ae94e00ce rcu: Make rcu_future_needs_gp() check all ->need_future_gps[] elements
Currently, the rcu_future_needs_gp() function checks only the current
element of the ->need_future_gps[] array, which might miss elements that
were offset from the expected element, for example, due to races with
the start or the end of a grace period.  This commit therefore makes
rcu_future_needs_gp() use the need_any_future_gp() macro to check all
of the elements of this array.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:41 -07:00
Paul E. McKenney
51af970d19 rcu: Avoid losing ->need_future_gp[] values due to GP start/end races
The rcu_cbs_completed() function provides the value of ->completed
at which new callbacks can safely be invoked.  This is recorded in
two-element ->need_future_gp[] arrays in the rcu_node structure, and
the elements of these arrays corresponding to the just-completed grace
period are zeroed at the end of that grace period.  However, the
rcu_cbs_completed() function can return the current ->completed value
plus either one or two, so it is possible for the corresponding
->need_future_gp[] entry to be cleared just after it was set, thus
losing a request for a future grace period.

This commit avoids this race by expanding ->need_future_gp[] to four
elements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:33 -07:00
Paul E. McKenney
fb31340f8a rcu: Make rcu_gp_cleanup() more accurately predict need for new GP
Currently, rcu_gp_cleanup() scans the rcu_node tree in order to reset
state to reflect the end of the grace period.  It also checks to see
whether a new grace period is needed, but in a number of cases, rather
than directly cause the new grace period to be immediately started, it
instead leaves the grace-period-needed state where various fail-safes
can find it.  This works fine, but results in higher contention on the
root rcu_node structure's ->lock, which is undesirable, and contention
on that lock has recently become noticeable.

This commit therefore makes rcu_gp_cleanup() immediately start a new
grace period if there is any need for one.

It is quite possible that it will later be necessary to throttle the
grace-period rate, but that can be dealt with when and if.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:28 -07:00
Paul E. McKenney
5fe0a56298 rcu: Make rcu_gp_kthread() check for early-boot activity
The rcu_gp_kthread() function immediately sleeps waiting to be notified
of the need for a new grace period, which currently works because there
are a number of code sequences that will provide the needed wakeup later.
However, some of these code sequences need to acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_gp_kthread() check for early-boot activity
when it starts up, omitting the initial sleep in that case.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:24 -07:00
Paul E. McKenney
c91a8675b9 rcu: Add accessor macros for the ->need_future_gp[] array
Accessors for the ->need_future_gp[] array are currently open-coded,
which makes them difficult to change.  To improve maintainability, this
commit adds need_future_gp_mask() to compute the indexing mask from the
array size, need_future_gp_element() to access the element corresponding
to the specified grace-period number, and need_any_future_gp() to
determine if any future grace period is needed.  This commit also applies
need_future_gp_element() to existing open-coded single-element accesses.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:18 -07:00
Paul E. McKenney
825a9911f6 rcu: Make rcu_start_future_gp()'s grace-period check more precise
The rcu_start_future_gp() function uses a sloppy check for a grace
period being in progress, which works today because there are a number
of code sequences that resolve the resulting races.  However, some of
these race-resolution code sequences must acquire the root rcu_node
structure's ->lock, and contention on that lock has started manifesting.
This commit therefore makes rcu_start_future_gp() check more precise,
eliminating the sloppy lockless check of the rcu_state structure's ->gpnum
and ->completed fields.  The effect is that rcu_start_future_gp() will
sometimes unnecessarily attempt to start a new grace period, but this
overhead will be reduced later using funnel locking.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:13 -07:00
Paul E. McKenney
9036c2ffd5 rcu: Improve non-root rcu_cbs_completed() accuracy
When rcu_cbs_completed() is invoked on a non-root rcu_node structure,
it unconditionally assumes that two grace periods must complete before
the callbacks at hand can be invoked.  This is overly conservative because
if that non-root rcu_node structure believes that no grace period is in
progress, and if the corresponding rcu_state structure's ->gpnum field
has not yet been incremented, then these callbacks may safely be invoked
after only one grace period has completed.

This change is required to permit grace-period start requests to use
funnel locking, which is in turn permitted to reduce root rcu_node ->lock
contention, which has been observed by Nick Piggin.  Furthermore, such
contention will likely be increased by the merging of RCU-bh, RCU-preempt,
and RCU-sched, so it makes sense to take steps to decrease it.

This commit therefore improves the accuracy of rcu_cbs_completed() when
invoked on a non-root rcu_node structure as described above.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:29:07 -07:00
Paul E. McKenney
5b4c11d54b rcu: Add leaf-node macros
This commit adds rcu_first_leaf_node() that returns a pointer to
the first leaf rcu_node structure in the specified RCU flavor and an
rcu_is_leaf_node() that returns true iff the specified rcu_node structure
is a leaf.  This commit also uses these macros where appropriate.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:28:07 -07:00
Paul E. McKenney
f7194ac32c srcu: Add cleanup_srcu_struct_quiesced()
The current cleanup_srcu_struct() flushes work, which prevents it
from being invoked from some workqueue contexts, as well as from
atomic (non-blocking) contexts.  This patch therefore introduced a
cleanup_srcu_struct_quiesced(), which can be invoked only after all
activity on the specified srcu_struct has completed.  This restriction
allows cleanup_srcu_struct_quiesced() to be invoked from workqueue
contexts as well as from atomic contexts.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nitzan Carmi <nitzanc@mellanox.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:56 -07:00
Yury Norov
17672480fb rcu: Declare rcu_eqs_special_set() in public header
Because rcu_eqs_special_set() is declared only in internal header
kernel/rcu/tree.h and stubbed in include/linux/rcutiny.h, it is
inaccessible outside of the RCU implementation.  This patch therefore
moves the  rcu_eqs_special_set() declaration to include/linux/rcutree.h,
which allows it to be used in non-rcu kernel code.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:51 -07:00
Paul E. McKenney
265f5f28f0 rcu: Update rcu_bind_gp_kthread() header comment
The header comment for rcu_bind_gp_kthread() refers to sysidle, which
is no longer with us.  However, it is still important to bind RCU's
grace-period kthreads to the housekeeping CPU(s), so rather than remove
rcu_bind_gp_kthread(), this commit updates the comment.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:46 -07:00
Paul E. McKenney
0e5da22e3f rcu: Move __rcu_read_lock() and __rcu_read_unlock() to tree_plugin.h
The __rcu_read_lock() and __rcu_read_unlock() functions were moved
to kernel/rcu/update.c in order to implement tiny preemptible RCU.
However, tiny preemptible RCU was removed from the kernel a long time
ago, so this commit belatedly moves them back into the only remaining
preemptible-RCU code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:41 -07:00
Paul E. McKenney
cee4393989 rcu: Rename cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs()
Commit e31d28b6ab ("trace: Eliminate cond_resched_rcu_qs() in favor
of cond_resched()") substituted cond_resched() for the earlier call
to cond_resched_rcu_qs().  However, the new-age cond_resched() does
not do anything to help RCU-tasks grace periods because (1) RCU-tasks
is only enabled when CONFIG_PREEMPT=y and (2) cond_resched() is a
complete no-op when preemption is enabled.  This situation results
in hangs when running the trace benchmarks.

A number of potential fixes were discussed on LKML
(https://lkml.kernel.org/r/20180224151240.0d63a059@vmware.local.home),
including making cond_resched() not be a no-op; making cond_resched()
not be a no-op, but only when running tracing benchmarks; reverting
the aforementioned commit (which works because cond_resched_rcu_qs()
does provide an RCU-tasks quiescent state; and adding a call to the
scheduler/RCU rcu_note_voluntary_context_switch() function.  All were
deemed unsatisfactory, either due to added cond_resched() overhead or
due to magic functions inviting cargo culting.

This commit renames cond_resched_rcu_qs() to cond_resched_tasks_rcu_qs(),
which provides a clear hint as to what this function is doing and
why and where it should be used, and then replaces the call to
cond_resched() with cond_resched_tasks_rcu_qs() in the trace benchmark's
benchmark_event_kthread() function.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:29 -07:00
Byungchul Park
6fba2b3767 rcu: Remove deprecated RCU debugfs tracing code
Commit ae91aa0adb ("rcu: Remove debugfs tracing") removed the
RCU debugfs tracing code, but did not remove the no-longer used
->exp_workdone{0,1,2,3} fields in the srcu_data structure.  This commit
therefore removes these fields along with the code that uselessly
updates them.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:23 -07:00
Byungchul Park
efcd2d5436 rcu: Call wake_nocb_leader_defer() with 'FORCE' when nocb_q_count is high
If an excessive number of callbacks have been queued, but the NOCB
leader kthread's wakeup must be deferred, then we should wake up the
leader unconditionally once it is safe to do so.

This was handled correctly in commit fbce7497ee ("rcu: Parallelize and
economize NOCB kthread wakeups"), but then commit 8be6e1b15c ("rcu:
Use timer as backstop for NOCB deferred wakeups") passed RCU_NOCB_WAKE
instead of the correct RCU_NOCB_WAKE_FORCE to wake_nocb_leader_defer().
As an interesting aside, RCU_NOCB_WAKE_FORCE is never passed to anything,
which should have been taken as a hint.  ;-)

This commit therefore passes RCU_NOCB_WAKE_FORCE instead of RCU_NOCB_WAKE
to wake_nocb_leader_defer() when a callback is queued onto a NOCB CPU
that already has an excessive number of callbacks pending.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:18 -07:00
Paul E. McKenney
ef12620626 rcu: Don't allocate rcu_nocb_mask if no one needs it
Commit 44c65ff2e3 ("rcu: Eliminate NOCBs CPU-state Kconfig options")
made allocation of rcu_nocb_mask depend only on the rcu_nocbs=,
nohz_full=, or isolcpus= kernel boot parameters.  However, it failed
to change the initial value of rcu_init_nohz()'s local variable
need_rcu_nocb_mask to false, which can result in useless allocation
of an all-zero rcu_nocb_mask.  This commit therefore fixes this bug by
changing the initial value of need_rcu_nocb_mask to false.

While we are in the area, also correct the error message that is printed
when someone specifies that can-never-exist CPUs should be NOCBs CPUs.

Reported-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Byungchul Park <byungchul.park@lge.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:11 -07:00
Byungchul Park
be01b4cab1 rcu: Inline rcu_preempt_do_callback() into its sole caller
The rcu_preempt_do_callbacks() function was introduced in commit
09223371dea(rcu: Use softirq to address performance regression), where it
was necessary to handle kernel builds both containing and not containing
RCU-preempt.  Since then, various changes (most notably f8b7fc6b51
("rcu: use softirq instead of kthreads except when RCU_BOOST=y")) have
resulted in this function being invoked only from rcu_kthread_do_work(),
which is present only in kernels containing RCU-preempt, which in turn
means that the rcu_preempt_do_callbacks() function is no longer needed.

This commit therefore inlines rcu_preempt_do_callbacks() into its
sole remaining caller and also removes the rcu_state_p and rcu_data_p
indirection for added clarity.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[ paulmck: Remove the rcu_state_p and rcu_data_p indirection. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:27:00 -07:00
Boqun Feng
55ebfce060 rcu: exp: Protect all sync_rcu_preempt_exp_done() with rcu_node lock
Currently some callsites of sync_rcu_preempt_exp_done() are not called
with the corresponding rcu_node's ->lock held, which could introduces
bugs as per Paul:

o	CPU 0 in sync_rcu_preempt_exp_done() reads ->exp_tasks and
	sees that it is NULL.

o	CPU 1 blocks within an RCU read-side critical section, so
	it enqueues the task and points ->exp_tasks at it and
	clears CPU 1's bit in ->expmask.

o	All other CPUs clear their bits in ->expmask.

o	CPU 0 reads ->expmask, sees that it is zero, so incorrectly
	concludes that all quiescent states have completed, despite
	the fact that ->exp_tasks is non-NULL.

To fix this, sync_rcu_preempt_exp_unlocked() is introduced to replace
lockless callsites of sync_rcu_preempt_exp_done().

Further, a lockdep annotation is added into sync_rcu_preempt_exp_done()
to prevent mis-use in the future.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:26:07 -07:00
Boqun Feng
7be8c56f8f rcu: exp: Fix "must hold exp_mutex" comments for QS reporting functions
Since commit d9a3da0699 ("rcu: Add expedited grace-period support
for preemptible RCU"), there are comments for some funtions in
rcu_report_exp_rnp()'s call-chain saying that exp_mutex or its
predecessors needs to be held.

However, exp_mutex and its predecessors were used only to synchronize
between GPs, and it is clear that all variables visited by those functions
are under the protection of rcu_node's ->lock. Moreover, those functions
are currently called without held exp_mutex, and seems that doesn't
introduce any trouble.

So this patch fixes this problem by updating the comments to match the
current code.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Fixes: d9a3da0699 ("rcu: Add expedited grace-period support for preemptible RCU")
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:26:01 -07:00
Paul E. McKenney
25f3d7effa rcu: Parallelize expedited grace-period initialization
The latency of RCU expedited grace periods grows with increasing numbers
of CPUs, eventually failing to be all that expedited.  Much of the growth
in latency is in the initialization phase, so this commit uses workqueues
to carry out this initialization concurrently on a rcu_node-by-rcu_node
basis.

This change makes use of a new rcu_par_gp_wq because flushing a work
item from another work item running from the same workqueue can result
in deadlock.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Nicholas Piggin <npiggin@gmail.com>
2018-05-15 10:25:44 -07:00
Paul E. McKenney
338c46403f Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD
fixes.2018.02.23a: Miscellaneous fixes
srcu.2018.02.20a: SRCU updates
torture.2018.02.20a: Torture-test updates
2018-02-23 15:15:41 -08:00
Paul E. McKenney
ad7c946b35 rcu: Create RCU-specific workqueues with rescuers
RCU's expedited grace periods can participate in out-of-memory deadlocks
due to all available system_wq kthreads being blocked and there not being
memory available to create more.  This commit prevents such deadlocks
by allocating an RCU-specific workqueue_struct at early boot time, and
providing it with a rescuer to ensure forward progress.  This uses the
shiny new init_rescuer() function provided by Tejun (but indirectly).

This commit also causes SRCU to use this new RCU-specific
workqueue_struct.  Note that SRCU's use of workqueues never blocks them
waiting for readers, so this should be safe from a forward-progress
viewpoint.  Note that this moves SRCU from system_power_efficient_wq
to a normal workqueue.  In the unlikely event that this results in
measurable degradation, a separate power-efficient workqueue will be
creates for SRCU.

Reported-by: Prateek Sood <prsood@codeaurora.org>
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
2018-02-23 15:14:40 -08:00
Paul E. McKenney
85ba6bfe8b torture: Provide more sensible nreader/nwriter defaults for rcuperf
The default values for nreader and nwriter are apparently not all that
user-friendly, resulting in people doing scalability tests that ran all
runs at large scale.  This commit therefore makes both the nreaders and
nwriters module default to the number of CPUs, and adds a comment to
rcuperf.c stating that the number of CPUs should be specified using the
nr_cpus kernel boot parameter.  This commit also eliminates the redundant
rcuperf scripting specification of default values for these parameters.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:22:01 -08:00
Paul E. McKenney
db0c1a8aba rcutorture: Record which grace-period primitives are tested
The rcu_torture_writer() function adapts to requested testing from module
parameters as well as the function pointers in the structure referenced
by cur_ops.  However, as long as the module parameters do not conflict
with the function pointers, this adaptation is silent.  This silence can
result in confusion as to exactly what was tested, which could in turn
result in untested RCU code making its way into mainline.

This commit therefore makes rcu_torture_writer() announce exactly which
portions of RCU's API it ends up testing.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:58 -08:00
Paul E. McKenney
f7c0e6ad4b rcutorture: Re-enable testing of dynamic expediting
During boot, normal grace periods are processed as expedited.  When
rcutorture is built into the kernel, it starts during boot and thus
detects that normal grace periods are unconditionally expedited.
Therefore, rcutorture concludes that there is no point in trying
to dynamically enable expediting, do it disables this aspect of testing,
which is a bit of an overreaction to the temporary boot-time expediting.

This commit therefore rechecks forced expediting throughout the test,
enabling dynamic expediting if normal grace periods are processed
normally at any point.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:57 -08:00
Paul E. McKenney
eb0339934f rcutorture: Avoid fake-writer use of undefined primitives
Currently the rcu_torture_fakewriter() function invokes cur_ops->sync()
and cur_ops->exp_sync() without first checking to see if they are in
fact non-NULL.  This results in kernel NULL pointer dereferences when
testing RCU implementations that choose not to provide the full set of
primitives.  Given that it is perfectly reasonable to have specialized
RCU implementations that provide only a subset of the RCU API, this is
a bug in rcutorture.

This commit therefore makes rcu_torture_fakewriter() check function
pointers before invoking them, thus allowing it to test subsetted
RCU implementations.

Reported-by: Lihao Liang <lianglihao@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:56 -08:00
Paul E. McKenney
e0d31a34c6 rcutorture: Abstract function and module names
This commit moves to __func__ for function names and for KBUILD_MODNAME
for module names, all in the name of better resilience to change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:56 -08:00
Paul E. McKenney
68a675d433 rcutorture: Replace multi-instance kzalloc() with kcalloc()
This commit replaces array-allocation calls to kzalloc() with
equivalent calls to kcalloc().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:55 -08:00
Paul E. McKenney
6308f34775 rcu: Remove SRCU throttling
The code in srcu_gp_end() inserts a delay every 0x3ff grace periods in
order to prevent SRCU grace-period work from consuming an entire CPU
when there is a long sequence of expedited SRCU grace-period requests.
However, all of SRCU's grace-period work is carried out in workqueues,
which are in turn within kthreads, which are automatically throttled as
needed by the scheduler.  In particular, if there is plenty of idle time,
there is no point in throttling.

This commit therefore removes the expedited SRCU grace-period throttling.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:13 -08:00
Byungchul Park
a72da917f1 srcu: Remove dead code in srcu_gp_end()
Of course, compilers will optimize out a dead code. Anyway, remove
any dead code for better readibility.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:12 -08:00
Ildar Ismagilov
8ddbd8832d srcu: Reduce scans of srcu_data in counter wrap check
Currently, given a multi-level srcu_node tree, SRCU can scan the full
set of srcu_data structures at each level when cleaning up after a grace
period.  This, though harmless otherwise, represents pointless overhead.
This commit therefore eliminates this overhead by scanning the srcu_data
structures only when traversing the leaf srcu_node structures.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:12 -08:00
Ildar Ismagilov
a35d13ec36 srcu: Prevent sdp->srcu_gp_seq_needed_exp counter wrap
SRCU checks each srcu_data structure's grace-period number for counter
wrap four times per cycle by default.  This frequency guarantees that
normal comparisons will detect potential wrap.  However, the expedited
grace-period number is not checked.  The consquences are not too horrible
(a failure to expedite a grace period when requested), but it would be
good to avoid such things.  This commit therefore adds this check to
the expedited grace-period number.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:11 -08:00
Paul E. McKenney
cb4081cd4e srcu: Abstract function name
This commit moves to __func__ for function names in the name of better
resilience to change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:21:11 -08:00
Paul E. McKenney
65963d2461 rcu: Make expedited RCU CPU selection avoid unnecessary stores
This commit reworks the first loop in sync_rcu_exp_select_cpus()
to avoid doing unnecssary stores to other CPUs' rcu_data
structures.  This speeds up that first loop by roughly a factor of
two on an old x86 system.  In the case where the system is mostly
idle, this loop incurs a large fraction of the overhead of the
synchronize_rcu_expedited().  There is less benefit on busy systems
because the overhead of the smp_call_function_single() in the second
loop dominates in that case.

However, it is not unusual to do configuration chances involving
RCU grace periods (both expedited and normal) while the system is
mostly idle, so this optimization is worth doing.

While we are in the area, this commit also adds parentheses to arguments
used by the for_each_leaf_node_possible_cpu() macro.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:29 -08:00
Paul E. McKenney
7f5d42d051 rcu: Trace expedited GP delays due to transitioning CPUs
If a CPU is transitioning to or from offline state, an expedited
grace period may undergo a timed wait.  This timed wait can unduly
delay grace periods, so this commit adds a trace statement to make
it visible.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:28 -08:00
Paul E. McKenney
9a414201ae rcu: Add more tracing of expedited grace periods
This commit adds more tracing of expedited grace periods to enable
improved debugging of slowdowns.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:27 -08:00
Ildar Ismagilov
274afd6bfa rcu: Fix misprint in srcu_funnel_exp_start
The srcu_funnel_exp_start() function checks to see if the srcu_struct
structure's expedited grace period counter needs updating to reflect a
newly arrived request for an expedited SRCU grace period.  Unfortunately,
the check is backwards, so this commit reverses the sense of the test.

Signed-off-by: Ildar Ismagilov <devix84@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:27 -08:00
Matthew Wilcox
a32e01ee68 rcu: Use wrapper for lockdep asserts
Commits c0b334c5bf and ea9b0c8a26 introduced new sparse warnings
by accessing rcu_node->lock directly and ignoring the __private
marker.  Introduce a new wrapper and use it.  Also fix a similar problem
in srcutree.c introduced by a3883df393.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:12:26 -08:00
Liu, Changcheng
65518db86b rcu: Remove redundant nxttail index macro define
RCU's nxttail has been optimized to be a rcu_segcblist, which is
a multi-tailed linked list with macros defined for the indexes for
each tail.  The indexes have been defined in linux/rcu_segcblist.h,
so this commit removes the redundant definitions in kernel/rcu/tree.h.

Signed-off-by: Liu Changcheng <changcheng.liu@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:31 -08:00
Paul E. McKenney
bfbd767d4d rcu: Consolidate rcu.h #ifdefs
The kernel/rcu/rcu.h file has a pair of consecutive #ifdefs on
CONFIG_TINY_RCU, so this commit consolidates them, thus saving a few
lines of code.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:30 -08:00
Paul E. McKenney
d07aee2c03 rcu: More clearly identify grace-period kthread stack dump
It is not always obvious that the stack dump from a starved grace-period
kthread isn't instead that of a CPU stalling the current grace period.
This commit therefore adds a pr_err() flagging these dumps.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:29 -08:00
Paul E. McKenney
d62df57370 rcu: Remove obsolete force-quiescent-state statistics for debugfs
The debugfs interface displayed statistics on RCU-pending checks but
this interface has since been removed.  This commit therefore removes the
no-longer-used rcu_state structure's ->n_force_qs_lh and ->n_force_qs_ngp
fields along with their updates.  (Though the ->n_force_qs_ngp field
was actually not used at all, embarrassingly enough.)

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:29 -08:00
Paul E. McKenney
01c495f72a rcu: Remove obsolete __rcu_pending() statistics for debugfs
The debugfs interface displayed statistics on RCU-pending checks
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_rcu_pending,
->n_rp_core_needs_qs, ->n_rp_report_qs, ->n_rp_cb_ready,
->n_rp_cpu_needs_gp, ->n_rp_gp_completed, ->n_rp_gp_started,
->n_rp_nocb_defer_wakeup, and ->n_rp_need_nothing fields along with
their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:28 -08:00
Paul E. McKenney
62df63e048 rcu: Remove obsolete callback-invocation statistics for debugfs
The debugfs interface displayed statistics on RCU callback invocation but
this interface has since been removed.  This commit therefore removes the
no-longer-used rcu_data structure's ->n_cbs_invoked and ->n_nocbs_invoked
fields along with their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:27 -08:00
Paul E. McKenney
bec06785fe rcu: Remove obsolete boost statistics for debugfs
The debugfs interface displayed statistics on RCU priority boosting,
but this interface has since been removed.  This commit therefore
removes the no-longer-used rcu_data structure's ->n_tasks_boosted,
->n_exp_boosts, and ->n_exp_boosts and their updates.

If this information proves necessary in the future, the corresponding
event traces will be added.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:27 -08:00
Tejun Heo
3caa973b7a rcu: Call touch_nmi_watchdog() while printing stall warnings
When RCU stall warning triggers, it can print out a lot of messages
while holding spinlocks.  If the console device is slow (e.g. an
actual or IPMI serial console), it may end up triggering NMI hard
lockup watchdog like the following.

*** CPU printking while holding RCU spinlock

  PID: 4149739  TASK: ffff881a46baa880  CPU: 13  COMMAND: "CPUThreadPool8"
   #0 [ffff881fff945e48] crash_nmi_callback at ffffffff8103f7d0
   #1 [ffff881fff945e58] nmi_handle at ffffffff81020653
   #2 [ffff881fff945eb0] default_do_nmi at ffffffff81020c36
   #3 [ffff881fff945ed0] do_nmi at ffffffff81020d32
   #4 [ffff881fff945ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: io_serial_in+21]
      RIP: ffffffff81630e55  RSP: ffff881fff943b88  RFLAGS: 00000002
      RAX: 000000000000ca00  RBX: ffffffff8230e188  RCX: 0000000000000000
      RDX: 00000000000002fd  RSI: 0000000000000005  RDI: ffffffff8230e188
      RBP: ffff881fff943bb0   R8: 0000000000000000   R9: ffffffff820cb3c4
      R10: 0000000000000019  R11: 0000000000002000  R12: 00000000000026e1
      R13: 0000000000000020  R14: ffffffff820cd398  R15: 0000000000000035
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  --- <NMI exception stack> ---
   #5 [ffff881fff943b88] io_serial_in at ffffffff81630e55
   #6 [ffff881fff943b90] wait_for_xmitr at ffffffff8163175c
   #7 [ffff881fff943bb8] serial8250_console_putchar at ffffffff816317dc
   #8 [ffff881fff943bd8] uart_console_write at ffffffff8162ac00
   #9 [ffff881fff943c08] serial8250_console_write at ffffffff81634691
  #10 [ffff881fff943c80] univ8250_console_write at ffffffff8162f7c2
  #11 [ffff881fff943c90] console_unlock at ffffffff810dfc55
  #12 [ffff881fff943cf0] vprintk_emit at ffffffff810dffb5
  #13 [ffff881fff943d50] vprintk_default at ffffffff810e01bf
  #14 [ffff881fff943d60] vprintk_func at ffffffff810e1127
  #15 [ffff881fff943d70] printk at ffffffff8119a8a4
  #16 [ffff881fff943dd0] print_cpu_stall_info at ffffffff810eb78c
  #17 [ffff881fff943e88] rcu_check_callbacks at ffffffff810ef133
  #18 [ffff881fff943ee8] update_process_times at ffffffff810f3497
  #19 [ffff881fff943f10] tick_sched_timer at ffffffff81103037
  #20 [ffff881fff943f38] __hrtimer_run_queues at ffffffff810f3f38
  #21 [ffff881fff943f88] hrtimer_interrupt at ffffffff810f442b

*** CPU triggering the hardlockup watchdog

  PID: 4149709  TASK: ffff88010f88c380  CPU: 26  COMMAND: "CPUThreadPool35"
   #0 [ffff883fff1059d0] machine_kexec at ffffffff8104a874
   #1 [ffff883fff105a30] __crash_kexec at ffffffff811116cc
   #2 [ffff883fff105af0] __crash_kexec at ffffffff81111795
   #3 [ffff883fff105b08] panic at ffffffff8119a6ae
   #4 [ffff883fff105b98] watchdog_overflow_callback at ffffffff81135dbd
   #5 [ffff883fff105bb0] __perf_event_overflow at ffffffff81186866
   #6 [ffff883fff105be8] perf_event_overflow at ffffffff81192bc4
   #7 [ffff883fff105bf8] intel_pmu_handle_irq at ffffffff8100b265
   #8 [ffff883fff105df8] perf_event_nmi_handler at ffffffff8100489f
   #9 [ffff883fff105e58] nmi_handle at ffffffff81020653
  #10 [ffff883fff105eb0] default_do_nmi at ffffffff81020b94
  #11 [ffff883fff105ed0] do_nmi at ffffffff81020d32
  #12 [ffff883fff105ef0] end_repeat_nmi at ffffffff81956a7e
      [exception RIP: queued_spin_lock_slowpath+248]
      RIP: ffffffff810da958  RSP: ffff883fff103e68  RFLAGS: 00000046
      RAX: 0000000000000000  RBX: 0000000000000046  RCX: 00000000006d0000
      RDX: ffff883fff49a950  RSI: 0000000000d10101  RDI: ffffffff81e54300
      RBP: ffff883fff103e80   R8: ffff883fff11a950   R9: 0000000000000000
      R10: 000000000e5873ba  R11: 000000000000010f  R12: ffffffff81e54300
      R13: 0000000000000000  R14: ffff88010f88c380  R15: ffffffff81e54300
      ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
  --- <NMI exception stack> ---
  #13 [ffff883fff103e68] queued_spin_lock_slowpath at ffffffff810da958
  #14 [ffff883fff103e70] _raw_spin_lock_irqsave at ffffffff8195550b
  #15 [ffff883fff103e88] rcu_check_callbacks at ffffffff810eed18
  #16 [ffff883fff103ee8] update_process_times at ffffffff810f3497
  #17 [ffff883fff103f10] tick_sched_timer at ffffffff81103037
  #18 [ffff883fff103f38] __hrtimer_run_queues at ffffffff810f3f38
  #19 [ffff883fff103f88] hrtimer_interrupt at ffffffff810f442b
  --- <IRQ stack> ---

Avoid spuriously triggering NMI hardlockup watchdog by touching it
from the print functions.  show_state_filter() shares the same problem
and solution.

v2: Relocate the comment to where it belongs.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:26 -08:00
Paul E. McKenney
3016611eed rcu: Fix CPU offload boot message when no CPUs are offloaded
In CONFIG_RCU_NOCB_CPU=y kernels, if the boot parameters indicate that
none of the CPUs should in fact be offloaded, the following somewhat
obtuse message appears:

	Offload RCU callbacks from CPUs: .

This commit therefore makes the message at least grammatically correct
in this case:

	Offload RCU callbacks from CPUs: (none)

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-20 16:10:19 -08:00
Lihao Liang
398953e62c rcu: Remove unnecessary spinlock in rcu_boot_init_percpu_data()
Since rcu_boot_init_percpu_data() is only called at boot time,
there is no data race and spinlock is not needed.

Signed-off-by: Lihao Liang <lianglihao@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2018-02-15 15:40:36 -08:00
Linus Torvalds
28bc6fb959 SCSI misc on 20180131
This is mostly updates of the usual driver suspects: arcmsr,
 scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
 hisi_sas.  We also have a rework of the libsas hotplug handling to
 make it more robust, a slew of 32 bit time conversions and fixes, and
 a host of the usual minor updates and style changes.  The biggest
 potential for regressions is the libsas hotplug changes, but so far
 they seem stable under testing.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCWnH+5SYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishWxuAP0UvuJp
 MNR/yU/wv/emSzOc48Ldwd7I0xD2XxSnloGUgwD+IGZZT5yNUQA1THCbm+en4hkB
 WvyBieQs9qRit+2czd4=
 =gJMf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This is mostly updates of the usual driver suspects: arcmsr,
  scsi_debug, mpt3sas, lpfc, cxlflash, qla2xxx, aacraid, megaraid_sas,
  hisi_sas.

  We also have a rework of the libsas hotplug handling to make it more
  robust, a slew of 32 bit time conversions and fixes, and a host of the
  usual minor updates and style changes. The biggest potential for
  regressions is the libsas hotplug changes, but so far they seem stable
  under testing"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (313 commits)
  scsi: qla2xxx: Fix logo flag for qlt_free_session_done()
  scsi: arcmsr: avoid do_gettimeofday
  scsi: core: Add VENDOR_SPECIFIC sense code definitions
  scsi: qedi: Drop cqe response during connection recovery
  scsi: fas216: fix sense buffer initialization
  scsi: ibmvfc: Remove unneeded semicolons
  scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
  scsi: hisi_sas: directly attached disk LED feature for v2 hw
  scsi: hisi_sas: devicetree: bindings: add LED feature for v2 hw
  scsi: megaraid_sas: NVMe passthrough command support
  scsi: megaraid: use ktime_get_real for firmware time
  scsi: fnic: use 64-bit timestamps
  scsi: qedf: Fix error return code in __qedf_probe()
  scsi: devinfo: fix format of the device list
  scsi: qla2xxx: Update driver version to 10.00.00.05-k
  scsi: qla2xxx: Add XCB counters to debugfs
  scsi: qla2xxx: Fix queue ID for async abort with Multiqueue
  scsi: qla2xxx: Fix warning for code intentation in __qla24xx_handle_gpdb_event()
  scsi: qla2xxx: Fix warning during port_name debug print
  scsi: qla2xxx: Fix warning in qla2x00_async_iocb_timeout()
  ...
2018-01-31 11:23:28 -08:00
Paul E. McKenney
1dfa55e019 Merge branches 'cond_resched.2017.12.04a', 'dyntick.2017.11.28a', 'fixes.2017.12.11a', 'srbd.2017.12.05a' and 'torture.2017.12.11a' into HEAD
cond_resched.2017.12.04a: Convert cond_resched_rcu_qs() to cond_resched()
dyntick.2017.11.28a: Make RCU dynticks handle interrupts from NMI
fixes.2017.12.11a: Miscellaneous fixes
srbd.2017.12.05a: Remove now-redundant smp_read_barrier_depends()
torture.2017.12.11a: Torture-testing update
2017-12-11 09:21:58 -08:00
Paul E. McKenney
a2f2577d96 torture: Eliminate torture_runnable and perf_runnable
The purpose of torture_runnable is to allow rcutorture and locktorture
to be started and stopped via sysfs when they are built into the kernel
(as in not compiled as loadable modules).  However, the 0444 permissions
for both instances of torture_runnable prevent this use case from ever
being put into practice.  Given that there have been no complaints
about this deficiency, it is reasonable to conclude that no one actually
makes use of this sysfs capability.  The perf_runnable module parameter
for rcuperf is in the same situation.

This commit therefore removes both torture_runnable instances as well
as perf_runnable.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:29 -08:00
Paul E. McKenney
e8302739aa rcutorture: Preempt RCU-preempt readers more vigorously
This commit attempts to make a very rare rcutorture failure happen
more often by increasing the fraction of RCU-preempt read-side critical
sections that are preempted.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:22 -08:00
Paul E. McKenney
cc1321c96f torture: Reduce #ifdefs for preempt_schedule()
This commit adds a torture_preempt_schedule() that is nothingness
in !PREEMPT builds and is preempt_schedule() otherwise.  Then
torture_preempt_schedule() is used to eliminate several ugly #ifdefs,
both in rcutorture and in locktorture.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:18:22 -08:00
Rakib Mullick
84b12b752f rcu: Remove have_rcu_nocb_mask from tree_plugin.h
Currently have_rcu_nocb_mask is used to avoid double allocation of
rcu_nocb_mask during boot up. Due to different representation of
cpumask_var_t on different kernel config CPUMASK=y(or n) it was okay.
But now we have a helper cpumask_available(), which can be utilized
to check whether rcu_nocb_mask has been allocated or not without using
a variable.

Removing the variable also reduces vmlinux size.

Unpatched version:
text	   data	    bss	    dec	    hex	filename
13050393	7852470	14543408	35446271	21cddff	vmlinux

Patched version:
 text	   data	    bss	    dec	    hex	filename
13050390	7852438	14543408	35446236	21cdddc	vmlinux

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:17:40 -08:00
Paul E. McKenney
efd88b02bb rcu: Add comment giving debug strategy for double call_rcu()
The following statement has for some reason proven non-intuitive:

	WARN_ON_ONCE(rcu_segcblist_empty(&rdp->cblist) != (count == 0));

This commit therefore adds a comment that states that this warning
usually triggers in response to a double call_rcu(), which is sort
of like a double free.  The comment also suggests building with
CONFIG_DEBUG_OBJECTS_RCU_HEAD=y to track down the double call_rcu().

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-12-11 09:17:39 -08:00
Paul E. McKenney
156baec397 rcu: Export init_rcu_head() and destroy_rcu_head() to GPL modules
Use of init_rcu_head() and destroy_rcu_head() from modules results in
the following build-time error with CONFIG_DEBUG_OBJECTS_RCU_HEAD=y:

	ERROR: "init_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!
	ERROR: "destroy_rcu_head" [drivers/scsi/scsi_mod.ko] undefined!

This commit therefore adds EXPORT_SYMBOL_GPL() for each to allow them to
be used by GPL-licensed kernel modules.

Reported-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-12-07 19:51:49 -05:00
Paul E. McKenney
d633198088 srcu: Prohibit call_srcu() use under raw spinlocks
Invoking queue_delayed_work() while holding a raw spinlock is forbidden
in -rt kernels, which is exactly what __call_srcu() does, indirectly via
srcu_funnel_gp_start().  This commit therefore downgrades Tree SRCU's
locking from raw to non-raw spinlocks, which works because call_srcu()
is not ever called while holding a raw spinlock.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:52:33 -08:00
Paul E. McKenney
e68bbb266d rcu: Simplify rcu_eqs_{enter,exit}() non-idle task debug code
The code that checks for non-idle non-nohz_idle-usermode tasks invoking
rcu_eqs_enter() and rcu_eqs_exit() prints a considerable quantity of
helpful information.  However, these checks fire rarely, so the extra
complexity is no longer worth it.  This commit therefore replaces this
debug code with simple WARN_ON_ONCE() statements.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
9dd238e286 rcu: Fold rcu_eqs_exit_common() into rcu_eqs_exit()
There is now only one call to rcu_eqs_exit_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:21 -08:00
Paul E. McKenney
215bba9f59 rcu: Fold rcu_eqs_enter_common() into rcu_eqs_enter()
There is now only one call to rcu_eqs_enter_common() and there is no other
reason to keep it separate.  This commit therefore inlines it into its
sole call site, saving a few lines of code in the process.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
2342172fd6 rcu: Avoid ->dynticks_nesting store tearing
Although ->dynticks_nesting is updated only by process level, it is
accessed from hardirq to check for interrupt-from-idle quiescent states.
Store tearing is thus possible, so this commit applies WRITE_ONCE()
to ->dynticks_nesting stores.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:20 -08:00
Paul E. McKenney
914955e18c rcu: Stop duplicating lockdep checks in RCU's idle-entry code
The three RCU_LOCKDEP_WARN() calls in rcu_eqs_enter_common() are
redundant with other lockdep checks, so this commit removes them.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
dec98900ea rcu: Add ->dynticks field to rcu_dyntick trace event
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:19 -08:00
Paul E. McKenney
84585aa8b6 rcu: Shrink ->dynticks_{nmi_,}nesting from long long to long
Because the ->dynticks_nesting field now only contains the process-based
nesting level instead of a value encoding both the process nesting level
and the irq "nesting" level, we no longer need a long long, even on
32-bit systems.  This commit therefore changes both the ->dynticks_nesting
and ->dynticks_nmi_nesting fields to long.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney
bd2b879a1c rcu: Add tracing to irq/NMI dyntick-idle transitions
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-28 15:51:18 -08:00
Paul E. McKenney
844ccdd7dc rcu: Eliminate rcu_irq_enter_disabled()
Now that the irq path uses the rcu_nmi_{enter,exit}() algorithm,
rcu_irq_enter() and rcu_irq_exit() may be used from any context.  There is
thus no need for rcu_irq_enter_disabled() and for the checks using it.
This commit therefore eliminates rcu_irq_enter_disabled().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
51a1fd30f1 rcu: Make ->dynticks_nesting be a simple counter
Now that ->dynticks_nesting counts only process-level dyntick-idle
entry and exit, there is no need for the elaborate segmented counter
with its guard fields and overflow checking.  This commit therefore
makes ->dynticks_nesting be a simple counter.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
58721f5da4 rcu: Define rcu_irq_{enter,exit}() in terms of rcu_nmi_{enter,exit}()
RCU currently uses two different mechanisms for tracking irqs and NMIs.
This is unnecessary complexity: Given that NMIs can nest and given that
RCU's tracking handles such nesting, the NMI tracking mechanism can also
be used to track irqs.  This commit therefore defines rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_nmi_exit().

Unfortunately, callers must still distinguish between the irq and NMI
functions because additional actions are taken when an irq interrupts
idle or nohz_full usermode execution, and these actions cannot always
be taken from NMI handlers.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:42:03 -08:00
Paul E. McKenney
6136d6e48a rcu: Clamp ->dynticks_nmi_nesting at eqs entry/exit
In preparation for merging dyntick-idle irq handling into the NMI
algorithm, clamp ->dynticks_nmi_nesting value to allow for interrupts
that enter but never leave and vice versa.

It is important that the clamping happen outside of the extended quiescent
state.  Otherwise, there will be short windows where irqs and NMIs fail
to convince RCU to start watching.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00
Paul E. McKenney
fd581a91ac rcu: Move rcu_nmi_{enter,exit}() to prepare for consolidation
This is a code-motion-only commit that prepares to define rcu_irq_enter()
in terms of rcu_nmi_enter() and rcu_irq_exit() in terms of rcu_irq_exit().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2017-11-27 08:40:10 -08:00