kernel_optimize_test

History

Tejun Heo 36c38fb714 blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() During the recent conversion of cgroup to kernfs, cgroup_tree_mutex which nests above both the kernfs s_active protection and cgroup_mutex is added to synchronize cgroup file type operations as cgroup_mutex needed to be grabbed from some file operations and thus can't be put above s_active protection. While this arrangement mostly worked for cgroup, this triggered the following lockdep warning. ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 Tainted: G W ------------------------------------------------------- trinity-c173/9024 is trying to acquire lock: (blkcg_pol_mutex){+.+.+.}, at: blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) but task is already holding lock: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (s_active#89){++++.+}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) __kernfs_remove (arch/x86/include/asm/atomic.h:27 fs/kernfs/dir.c:352 fs/kernfs/dir.c:1024) kernfs_remove_by_name_ns (fs/kernfs/dir.c:1219) cgroup_addrm_files (include/linux/kernfs.h:427 kernel/cgroup.c:1074 kernel/cgroup.c:2899) cgroup_clear_dir (kernel/cgroup.c:1092 (discriminator 2)) rebind_subsystems (kernel/cgroup.c:1144) cgroup_setup_root (kernel/cgroup.c:1568) cgroup_mount (kernel/cgroup.c:1716) mount_fs (fs/super.c:1094) vfs_kern_mount (fs/namespace.c:899) do_mount (fs/namespace.c:2238 fs/namespace.c:2561) SyS_mount (fs/namespace.c:2758 fs/namespace.c:2729) tracesys (arch/x86/kernel/entry_64.S:746) -> #1 (cgroup_tree_mutex){+.+.+.}: lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) cgroup_add_cftypes (include/linux/list.h:76 kernel/cgroup.c:3040) blkcg_policy_register (block/blk-cgroup.c:1106) throtl_init (block/blk-throttle.c:1694) do_one_initcall (init/main.c:789) kernel_init_freeable (init/main.c:854 init/main.c:863 init/main.c:882 init/main.c:1003) kernel_init (init/main.c:935) ret_from_fork (arch/x86/kernel/entry_64.S:552) -> #0 (blkcg_pol_mutex){+.+.+.}: __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) tracesys (arch/x86/kernel/entry_64.S:746) other info that might help us debug this: Chain exists of: blkcg_pol_mutex --> cgroup_tree_mutex --> s_active#89 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#89); lock(cgroup_tree_mutex); lock(s_active#89); lock(blkcg_pol_mutex); * DEADLOCK * 4 locks held by trinity-c173/9024: #0: (&f->f_pos_lock){+.+.+.}, at: __fdget_pos (fs/file.c:714) #1: (sb_writers#18){.+.+.+}, at: vfs_write (include/linux/fs.h:2255 fs/read_write.c:530) #2: (&of->mutex){+.+.+.}, at: kernfs_fop_write (fs/kernfs/file.c:283) #3: (s_active#89){++++.+}, at: kernfs_fop_write (fs/kernfs/file.c:283) stack backtrace: CPU: 3 PID: 9024 Comm: trinity-c173 Tainted: G W 3.15.0-rc3-next-20140430-sasha-00016-g4e281fa-dirty #429 ffffffff919687b0 ffff8805f6373bb8 ffffffff8e52cdbb 0000000000000002 ffffffff919d8400 ffff8805f6373c08 ffffffff8e51fb88 0000000000000004 ffff8805f6373c98 ffff8805f6373c08 ffff88061be70d98 ffff88061be70dd0 Call Trace: dump_stack (lib/dump_stack.c:52) print_circular_bug (kernel/locking/lockdep.c:1216) __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182) lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) mutex_lock_nested (kernel/locking/mutex.c:486 kernel/locking/mutex.c:587) blkcg_reset_stats (include/linux/spinlock.h:328 block/blk-cgroup.c:455) cgroup_file_write (kernel/cgroup.c:2714) kernfs_fop_write (fs/kernfs/file.c:295) vfs_write (fs/read_write.c:532) SyS_write (fs/read_write.c:584 fs/read_write.c:576) This is a highly unlikely but valid circular dependency between "echo 1 > blkcg.reset_stats" and cfq module [un]loading. cgroup is going through further locking update which will remove this complication but for now let's use trylock on blkcg_pol_mutex and retry the file operation if the trylock fails. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> References: http://lkml.kernel.org/g/5363C04B.4010400@oracle.com		2014-05-05 13:48:18 -04:00
..
partitions	block: Use macros from compiler.h instead of __attribute__((...))	2014-02-18 12:20:01 -08:00
blk-cgroup.c	blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats()	2014-05-05 13:48:18 -04:00
blk-cgroup.h	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2014-04-03 13:05:42 -07:00
blk-core.c	block: fix regression with block enabled tagging	2014-04-09 21:54:06 -06:00
blk-exec.c	blk-mq: merge blk_mq_insert_request and blk_mq_run_request	2014-03-21 08:57:37 -06:00
blk-flush.c	blk-mq: merge blk_mq_insert_request and blk_mq_run_request	2014-03-21 08:57:37 -06:00
blk-integrity.c	bio-integrity: Convert to bvec_iter	2013-11-23 22:33:50 -08:00
blk-ioc.c	block: Substitute rcu_access_pointer() for rcu_dereference_raw()	2014-02-18 12:21:26 -08:00
blk-iopoll.c	block: remove old blk_iopoll_enabled variable	2014-03-13 09:38:42 -06:00
blk-lib.c	block: add cond_resched() to potentially long running ioctl discard loop	2014-02-12 09:36:37 -07:00
blk-map.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-04-12 14:49:50 -07:00
blk-merge.c	block: Explicitly handle discard/write same segments	2014-02-07 13:54:08 -07:00
blk-mq-cpu.c	rt,blk,mq: Make blk_mq_cpu_notify_lock a raw spinlock	2014-03-21 08:57:56 -06:00
blk-mq-cpumap.c	blk-mq: don't dump CPU -> hw queue map on driver load	2014-03-20 13:31:44 -06:00
blk-mq-sysfs.c	blk-mq: don't dump CPU -> hw queue map on driver load	2014-03-20 13:31:44 -06:00
blk-mq-tag.c	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	2014-02-14 10:45:18 -08:00
blk-mq-tag.h	blk-mq: new multi-queue block IO queueing mechanism	2013-10-25 11:56:00 +01:00
blk-mq.c	blk-mq: fix potential stall during CPU unplug with IO pending	2014-04-07 08:17:18 -06:00
blk-mq.h	blk-mq: merge blk_mq_insert_request and blk_mq_run_request	2014-03-21 08:57:37 -06:00
blk-settings.c	bcache/md: Use raid stripe size	2014-01-08 13:05:09 -08:00
blk-softirq.c	block: fix regression with block enabled tagging	2014-04-09 21:54:06 -06:00
blk-sysfs.c	blk-mq: rework flush sequencing logic	2014-02-10 09:29:00 -07:00
blk-tag.c	block: Reserve only one queue tag for sync IO if only 3 tags are available	2013-06-28 21:32:27 +02:00
blk-throttle.c	cgroup: drop const from @buffer of cftype->write_string()	2014-03-19 10:23:54 -04:00
blk-timeout.c	blk-mq: rework I/O completions	2014-02-10 09:27:31 -07:00
blk.h	block: fix regression with block enabled tagging	2014-04-09 21:54:06 -06:00
bsg-lib.c	bsg: Remove unused function bsg_goose_queue()	2012-12-06 14:33:02 +01:00
bsg.c	hlist: drop the node parameter from iterators	2013-02-27 19:10:24 -08:00
cfq-iosched.c	Merge branch 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup	2014-04-03 13:05:42 -07:00
cmdline-parser.c	block: remove unrelated header files and export symbol	2014-01-21 20:18:26 -08:00
compat_ioctl.c	kernel-wide: fix missing validations on __get/__put/__copy_to/__copy_from_user()	2013-09-11 15:58:18 -07:00
deadline-iosched.c	block: Stop abusing csd.list for fifo_time	2014-02-24 14:46:32 -08:00
elevator.c	block: fix regression with block enabled tagging	2014-04-09 21:54:06 -06:00
genhd.c	block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)	2013-09-11 13:22:03 -06:00
ioctl.c	block: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO	2013-11-08 09:05:31 -07:00
Kconfig	block: change config option name for cmdline partition parsing	2013-09-30 14:31:02 -07:00
Kconfig.iosched	blkcg: make CONFIG_BLK_CGROUP bool	2012-03-06 21:27:21 +01:00
Makefile	blk-mq: new multi-queue block IO queueing mechanism	2013-10-25 11:56:00 +01:00
noop-iosched.c	elevator: Fix a race in elevator switching	2013-07-03 13:25:24 +02:00
partition-generic.c	Revert "loop: cleanup partitions when detaching loop device"	2013-04-08 10:12:11 +02:00
scsi_ioctl.c	block: Fix memory leak in rw_copy_check_uvector() handling	2014-01-21 20:36:17 -08:00