kernel_optimize_test/block
Paolo Valente 8f9bebc33d block, bfq: access and cache blkg data only when safe
In blk-cgroup, operations on blkg objects are protected with the
request_queue lock. This is no more the lock that protects
I/O-scheduler operations in blk-mq. In fact, the latter are now
protected with a finer-grained per-scheduler-instance lock. As a
consequence, although blkg lookups are also rcu-protected, blk-mq I/O
schedulers may see inconsistent data when they access blkg and
blkg-related objects. BFQ does access these objects, and does incur
this problem, in the following case.

The blkg_lookup performed in bfq_get_queue, being protected (only)
through rcu, may happen to return the address of a copy of the
original blkg. If this is the case, then the blkg_get performed in
bfq_get_queue, to pin down the blkg, is useless: it does not prevent
blk-cgroup code from destroying both the original blkg and all objects
directly or indirectly referred by the copy of the blkg. BFQ accesses
these objects, which typically causes a crash for NULL-pointer
dereference of memory-protection violation.

Some additional protection mechanism should be added to blk-cgroup to
address this issue. In the meantime, this commit provides a quick
temporary fix for BFQ: cache (when safe) blkg data that might
disappear right after a blkg_lookup.

In particular, this commit exploits the following facts to achieve its
goal without introducing further locks.  Destroy operations on a blkg
invoke, as a first step, hooks of the scheduler associated with the
blkg. And these hooks are executed with bfqd->lock held for BFQ. As a
consequence, for any blkg associated with the request queue an
instance of BFQ is attached to, we are guaranteed that such a blkg is
not destroyed, and that all the pointers it contains are consistent,
while that instance is holding its bfqd->lock. A blkg_lookup performed
with bfqd->lock held then returns a fully consistent blkg, which
remains consistent until this lock is held. In more detail, this holds
even if the returned blkg is a copy of the original one.

Finally, also the object describing a group inside BFQ needs to be
protected from destruction on the blkg_free of the original blkg
(which invokes bfq_pd_free). This commit adds private refcounting for
this object, to let it disappear only after no bfq_queue refers to it
any longer.

This commit also removes or updates some stale comments on locking
issues related to blk-cgroup operations.

Reported-by: Tomas Konir <tomas.konir@gmail.com>
Reported-by: Lee Tibbert <lee.tibbert@gmail.com>
Reported-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Tomas Konir <tomas.konir@gmail.com>
Tested-by: Lee Tibbert <lee.tibbert@gmail.com>
Tested-by: Marco Piazza <mpiazza@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-08 09:51:10 -06:00
..
partitions partitions/msdos: FreeBSD UFS2 file systems are not recognized 2017-05-23 09:16:07 -06:00
badblocks.c badblocks: badblocks_set/clear update unacked_exist 2016-10-21 15:45:47 -06:00
bfq-cgroup.c block, bfq: access and cache blkg data only when safe 2017-06-08 09:51:10 -06:00
bfq-iosched.c block, bfq: access and cache blkg data only when safe 2017-06-08 09:51:10 -06:00
bfq-iosched.h block, bfq: access and cache blkg data only when safe 2017-06-08 09:51:10 -06:00
bfq-wf2q.c block, bfq: use pointer entity->sched_data only if set 2017-05-10 07:39:43 -06:00
bio-integrity.c bio-integrity: Do not allocate integrity context for bio w/o data 2017-06-03 07:36:27 -06:00
bio.c Merge branch 'md-next' into md-linus 2017-05-01 14:09:21 -07:00
blk-cgroup.c block: Avoid that blk_exit_rl() triggers a use-after-free 2017-06-01 13:07:55 -06:00
blk-core.c block: Avoid that blk_exit_rl() triggers a use-after-free 2017-06-01 13:07:55 -06:00
blk-exec.c block: remove the errors field from struct request 2017-04-20 12:16:10 -06:00
blk-flush.c block: make __blk_end_bidi_request private 2017-04-19 10:19:47 -06:00
blk-integrity.c block: fix blk_integrity_register to use template's interval_exp if not 0 2017-04-23 12:59:56 -06:00
blk-ioc.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2017-03-03 10:53:35 -08:00
blk-lib.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
blk-map.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/task_stack.h> 2017-03-02 08:42:36 +01:00
blk-merge.c block: implement splitting of REQ_OP_WRITE_ZEROES bios 2017-04-08 11:25:38 -06:00
blk-mq-cpumap.c blk-mq: export blk_mq_map_queues 2016-11-08 17:30:00 -05:00
blk-mq-debugfs.c mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
blk-mq-debugfs.h mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
blk-mq-pci.c blk-mq-pci: Fix two spelling mistakes 2017-03-29 11:09:51 -06:00
blk-mq-sched.c blk-mq-debugfs: allow schedulers to register debugfs attributes 2017-05-04 08:24:40 -06:00
blk-mq-sched.h blk-mq: Remove blk_mq_sched_move_to_dispatch() 2017-04-20 17:28:30 -06:00
blk-mq-sysfs.c blk-mq: untangle debugfs and sysfs 2017-05-04 08:24:13 -06:00
blk-mq-tag.c blk-mq: add shallow depth option for blk_mq_get_tag() 2017-04-14 14:06:54 -06:00
blk-mq-tag.h blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset 2017-03-02 08:56:04 -07:00
blk-mq-virtio.c blk-mq: provide a default queue mapping for virtio device 2017-02-27 20:54:05 +02:00
blk-mq.c blk-mq: fix direct issue 2017-06-06 10:00:35 -06:00
blk-mq.h blk-mq: move debugfs declarations to a separate header file 2017-05-04 08:23:44 -06:00
blk-settings.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
blk-softirq.c sched/headers: Prepare for new header dependencies before moving code to <linux/sched/topology.h> 2017-03-02 08:42:26 +01:00
blk-stat.c blk-stat: don't use this_cpu_ptr() in a preemptable section 2017-05-10 07:40:18 -06:00
blk-stat.h blk-stat: kill blk_stat_rq_ddir() 2017-04-21 07:56:23 -06:00
blk-sysfs.c block: Avoid that blk_exit_rl() triggers a use-after-free 2017-06-01 13:07:55 -06:00
blk-tag.c blk-mq-sched: add framework for MQ capable IO schedulers 2017-01-17 10:04:20 -07:00
blk-throttle.c blk-throttle: set default latency baseline for harddisk 2017-06-07 09:09:32 -06:00
blk-timeout.c block: remove the errors field from struct request 2017-04-20 12:16:10 -06:00
blk-wbt.c blk-stat: kill blk_stat_rq_ddir() 2017-04-21 07:56:23 -06:00
blk-wbt.h block: Make writeback throttling defaults consistent for SQ devices 2017-04-19 08:49:03 -06:00
blk-zoned.c block: Rename blk_queue_zone_size and bdev_zone_size 2017-01-12 07:58:32 -07:00
blk.h block: Avoid that blk_exit_rl() triggers a use-after-free 2017-06-01 13:07:55 -06:00
bounce.c
bsg-lib.c scsi: introduce a result field in struct scsi_request 2017-04-20 12:16:10 -06:00
bsg.c Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-05-01 14:41:04 -07:00
cfq-iosched.c cfq-iosched: fix the delay of cfq_group's vdisktime under iops mode 2017-05-31 09:25:21 -06:00
cmdline-parser.c
compat_ioctl.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
deadline-iosched.c block: enumify ELEVATOR_*_MERGE 2017-02-08 13:43:06 -07:00
elevator.c elevator: remove redundant warnings on IO scheduler switch 2017-05-10 07:40:04 -06:00
genhd.c A reasonably busy cycle for documentation this time around. There is a new 2017-05-02 10:21:17 -07:00
ioctl.c block: remove the discard_zeroes_data flag 2017-04-08 11:25:38 -06:00
ioprio.c block: Optimize ioprio_best() 2017-04-19 17:38:36 -06:00
Kconfig Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm 2017-05-12 15:43:10 -07:00
Kconfig.iosched block, bfq: add full hierarchical scheduling and cgroups support 2017-04-19 08:30:26 -06:00
kyber-iosched.c kyber: add debugfs attributes 2017-05-04 08:25:17 -06:00
Makefile block, bfq: split bfq-iosched.c into multiple source files 2017-04-19 08:48:24 -06:00
mq-deadline.c mq-deadline: add debugfs attributes 2017-05-04 08:25:17 -06:00
noop-iosched.c block: move existing elevator ops to union 2017-01-17 10:03:33 -07:00
opal_proto.h block/sed-opal: allocate struct opal_dev dynamically 2017-02-17 12:41:47 -07:00
partition-generic.c block: fix an error code in add_partition() 2017-05-23 08:41:59 -06:00
scsi_ioctl.c scsi: introduce a result field in struct scsi_request 2017-04-20 12:16:10 -06:00
sed-opal.c block: sed-opal: Tone down all the pr_* to debugs 2017-04-07 14:24:16 -06:00
t10-pi.c block: constify struct blk_integrity_profile 2017-03-24 20:34:39 -06:00