kernel_optimize_test/block
Paolo Valente 8f9bebc33d block, bfq: access and cache blkg data only when safe
In blk-cgroup, operations on blkg objects are protected with the
request_queue lock. This is no more the lock that protects
I/O-scheduler operations in blk-mq. In fact, the latter are now
protected with a finer-grained per-scheduler-instance lock. As a
consequence, although blkg lookups are also rcu-protected, blk-mq I/O
schedulers may see inconsistent data when they access blkg and
blkg-related objects. BFQ does access these objects, and does incur
this problem, in the following case.

The blkg_lookup performed in bfq_get_queue, being protected (only)
through rcu, may happen to return the address of a copy of the
original blkg. If this is the case, then the blkg_get performed in
bfq_get_queue, to pin down the blkg, is useless: it does not prevent
blk-cgroup code from destroying both the original blkg and all objects
directly or indirectly referred by the copy of the blkg. BFQ accesses
these objects, which typically causes a crash for NULL-pointer
dereference of memory-protection violation.

Some additional protection mechanism should be added to blk-cgroup to
address this issue. In the meantime, this commit provides a quick
temporary fix for BFQ: cache (when safe) blkg data that might
disappear right after a blkg_lookup.

In particular, this commit exploits the following facts to achieve its
goal without introducing further locks.  Destroy operations on a blkg
invoke, as a first step, hooks of the scheduler associated with the
blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
consequence, for any blkg associated with the request queue an
instance of BFQ is attached to, we are guaranteed that such a blkg is
not destroyed, and that all the pointers it contains are consistent,
while that instance is holding its bfqd->lock. A blkg_lookup performed
with bfqd->lock held then returns a fully consistent blkg, which
remains consistent until this lock is held. In more detail, this holds
even if the returned blkg is a copy of the original one.

Finally, also the object describing a group inside BFQ needs to be
protected from destruction on the blkg_free of the original blkg
(which invokes bfq_pd_free). This commit adds private refcounting for
this object, to let it disappear only after no bfq_queue refers to it
any longer.

This commit also removes or updates some stale comments on locking
issues related to blk-cgroup operations.

Reported-by: Tomas Konir <tomas.konir@gmail.com>
Reported-by: Lee Tibbert <lee.tibbert@gmail.com>
Reported-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Tomas Konir <tomas.konir@gmail.com>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-08 09:51:10 -06:00
..
partitions
badblocks.c
bfq-cgroup.c
bfq-iosched.c
bfq-iosched.h
bfq-wf2q.c
bio-integrity.c
bio.c
blk-cgroup.c
blk-core.c
blk-exec.c
blk-flush.c
blk-integrity.c
blk-ioc.c
blk-lib.c
blk-map.c
blk-merge.c
blk-mq-cpumap.c
blk-mq-debugfs.c
blk-mq-debugfs.h
blk-mq-pci.c
blk-mq-sched.c
blk-mq-sched.h
blk-mq-sysfs.c
blk-mq-tag.c
blk-mq-tag.h
blk-mq-virtio.c
blk-mq.c
blk-mq.h
blk-settings.c
blk-softirq.c
blk-stat.c
blk-stat.h
blk-sysfs.c
blk-tag.c
blk-throttle.c
blk-timeout.c
blk-wbt.c
blk-wbt.h
blk-zoned.c
blk.h
bounce.c
bsg-lib.c
bsg.c
cfq-iosched.c
cmdline-parser.c
compat_ioctl.c
deadline-iosched.c
elevator.c
genhd.c
ioctl.c
ioprio.c
Kconfig
Kconfig.iosched
kyber-iosched.c
Makefile
mq-deadline.c
noop-iosched.c
opal_proto.h
partition-generic.c
scsi_ioctl.c
sed-opal.c
t10-pi.c