Commit Graph

21 Commits

Author SHA1 Message Date
Divyesh Shah
a11cdaa7af block: Update to io-controller stats
Changelog from v1:
o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
  dispatch time.

Changelog from original patchset: (in response to Vivek Goyal's comments)
o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
o rename blkiocg_update_set_active_queue_stats() to
  blkiocg_update_avg_queue_size_stats()
o s/request/io/ in blkiocg_update_request_add_stats() and
  blkiocg_update_request_remove_stats()
o Call cfq_del_timer() at request dispatch() instead of
  blkiocg_update_idle_time_stats()

Signed-off-by: Divyesh Shah<dpshah@google.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-13 19:59:17 +02:00
Gui Jianfeng
34d0f179d6 io-controller: Add a new interface "weight_device" for IO-Controller
Currently, IO Controller makes use of blkio.weight to assign weight for
all devices. Here a new user interface "blkio.weight_device" is introduced to
assign different weights for different devices. blkio.weight becomes the
default value for devices which are not configured by "blkio.weight_device"

You can use the following format to assigned specific weight for a given
device:
#echo "major:minor weight" > blkio.weight_device

major:minor represents device number.

And you can remove weight for a given device as following:
#echo "major:minor 0" > blkio.weight_device

V1->V2 changes:
- use user interface "weight_device" instead of "policy" suggested by Vivek
- rename some struct suggested by Vivek
- rebase to 2.6-block "for-linus" branch
- remove an useless list_empty check pointed out by Li Zefan
- some trivial typo fix

V2->V3 changes:
- Move policy_*_node() functions up to get rid of forward declarations
- rename related functions by adding prefix "blkio_"

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-13 10:14:20 +02:00
Divyesh Shah
812df48d12 blkio: Add more debug-only per-cgroup stats
1) group_wait_time - This is the amount of time the cgroup had to wait to get a
  timeslice for one of its queues from when it became busy, i.e., went from 0
  to 1 request queued. This is different from the io_wait_time which is the
  cumulative total of the amount of time spent by each IO in that cgroup waiting
  in the scheduler queue. This stat is a great way to find out any jobs in the
  fleet that are being starved or waiting for longer than what is expected (due
  to an IO controller bug or any other issue).
2) empty_time - This is the amount of time a cgroup spends w/o any pending
   requests. This stat is useful when a job does not seem to be able to use its
   assigned disk share by helping check if that is happening due to an IO
   controller bug or because the job is not submitting enough IOs.
3) idle_time - This is the amount of time spent by the IO scheduler idling
   for a given cgroup in anticipation of a better request than the exising ones
   from other queues/cgroups.

All these stats are recorded using start and stop events. When reading these
stats, we do not add the delta between the current time and the last start time
if we're between the start and stop events. We avoid doing this to make sure
that these numbers are always monotonically increasing when read. Since we're
using sched_clock() which may use the tsc as its source, it may induce some
inconsistency (due to tsc resync across cpus) if we included the current delta.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09 08:36:08 +02:00
Divyesh Shah
cdc1184cf4 blkio: Add io_queued and avg_queue_size stats
These stats are useful for getting a feel for the queue depth of the cgroup,
i.e., how filled up its queues are at a given instant and over the existence of
the cgroup. This ability is useful when debugging problems in the wild as it
helps understand the application's IO pattern w/o having to read through the
userspace code (coz its tedious or just not available) or w/o the ability
to run blktrace (since you may not have root access and/or not want to disturb
performance).

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09 08:36:08 +02:00
Divyesh Shah
812d402648 blkio: Add io_merged stat
This includes both the number of bios merged into requests belonging to this
cgroup as well as the number of requests merged together.
In the past, we've observed different merging behavior across upstream kernels,
some by design some actual bugs. This stat helps a lot in debugging such
problems when applications report decreased throughput with a new kernel
version.

This needed adding an extra elevator function to capture bios being merged as I
did not want to pollute elevator code with blkiocg knowledge and hence needed
the accounting invocation to come from CFQ.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09 08:36:07 +02:00
Divyesh Shah
84c124da9f blkio: Changes to IO controller additional stats patches
that include some minor fixes and addresses all comments.

Changelog: (most based on Vivek Goyal's comments)
o renamed blkiocg_reset_write to blkiocg_reset_stats
o more clarification in the documentation on io_service_time and io_wait_time
o Initialize blkg->stats_lock
o rename io_add_stat to blkio_add_stat and declare it static
o use bool for direction and sync
o derive direction and sync info from existing rq methods
o use 12 for major:minor string length
o define io_service_time better to cover the NCQ case
o add a separate reset_stats interface
o make the indexed stats a 2d array to simplify macro and function pointer code
o blkio.time now exports in jiffies as before
o Added stats description in patch description and
  Documentation/cgroup/blkio-controller.txt
o Prefix all stats functions with blkio and make them static as applicable
o replace IO_TYPE_MAX with IO_TYPE_TOTAL
o Moved #define constant to top of blk-cgroup.c
o Pass dev_t around instead of char *
o Add note to documentation file about resetting stats
o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
  statements
o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
  rq_direction() and rq_sync() functions which are used by CFQ and when using
  io-controller at a higher level, bio_* functions can be added.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-09 08:31:19 +02:00
Divyesh Shah
9195291e5f blkio: Increment the blkio cgroup stats for real now
We also add start_time_ns and io_start_time_ns fields to struct request
here to record the time when a request is created and when it is
dispatched to device. We use ns uints here as ms and jiffies are
not very useful for non-rotational media.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02 08:44:37 +02:00
Divyesh Shah
303a3acb23 blkio: Add io controller stats like
- io_service_time
- io_wait_time
- io_serviced
- io_service_bytes

These stats are accumulated per operation type helping us to distinguish between
read and write, and sync and async IO. This patch does not increment any of
these stats.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02 08:44:36 +02:00
Divyesh Shah
9a0785b0da blkio: Remove per-cfqq nr_sectors as we'll be passing
that info at request dispatch with other stats now. This patch removes the
existing support for accounting sectors for a blkio_group. This will be added
back differently in the next two patches.

Signed-off-by: Divyesh Shah<dpshah@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-04-02 08:44:36 +02:00
Ben Blum
67523c48aa cgroups: blkio subsystem as module
Modify the Block I/O cgroup subsystem to be able to be built as a module.
As the CFQ disk scheduler optionally depends on blk-cgroup, config options
in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
enhanced to support the new module dependency.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-03-12 15:52:36 -08:00
Gui Jianfeng
024f906616 cfq: Remove useless css reference get
There's no need to take css reference here, for the caller
has already called rcu_read_lock() to prevent cgroup from
being removed.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-26 08:56:15 +01:00
Gui Jianfeng
bcf4dd4342 blk-cgroup: Fix potential deadlock in blk-cgroup
I triggered a lockdep warning as following.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.33-rc2 #1
-------------------------------------------------------
test_io_control/7357 is trying to acquire lock:
 (blkio_list_lock){+.+...}, at: [<c053a990>] blkiocg_weight_write+0x82/0x9e

but task is already holding lock:
 (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&(&blkcg->lock)->rlock){......}:
       [<c04583b7>] validate_chain+0x8bc/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
       [<c053a4e1>] blkiocg_add_blkio_group+0x1a/0x6d
       [<c053cac7>] cfq_get_queue+0x225/0x3de
       [<c053eec2>] cfq_set_request+0x217/0x42d
       [<c052c8a6>] elv_set_request+0x17/0x26
       [<c0532a0f>] get_request+0x203/0x2c5
       [<c0532ae9>] get_request_wait+0x18/0x10e
       [<c0533470>] __make_request+0x2ba/0x375
       [<c0531985>] generic_make_request+0x28d/0x30f
       [<c0532da7>] submit_bio+0x8a/0x8f
       [<c04d827a>] submit_bh+0xf0/0x10f
       [<c04d91d2>] ll_rw_block+0xc0/0xf9
       [<f86e9705>] ext3_find_entry+0x319/0x544 [ext3]
       [<f86eae58>] ext3_lookup+0x2c/0xb9 [ext3]
       [<c04c3e1b>] do_lookup+0xd3/0x172
       [<c04c56c8>] link_path_walk+0x5fb/0x95c
       [<c04c5a65>] path_walk+0x3c/0x81
       [<c04c5b63>] do_path_lookup+0x21/0x8a
       [<c04c66cc>] do_filp_open+0xf0/0x978
       [<c04c0c7e>] open_exec+0x1b/0xb7
       [<c04c1436>] do_execve+0xbb/0x266
       [<c04081a9>] sys_execve+0x24/0x4a
       [<c04028a2>] ptregs_execve+0x12/0x18

-> #1 (&(&q->__queue_lock)->rlock){..-.-.}:
       [<c04583b7>] validate_chain+0x8bc/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c0692b0a>] _raw_spin_lock_irqsave+0x27/0x5a
       [<c053dd2a>] cfq_unlink_blkio_group+0x17/0x41
       [<c053a6eb>] blkiocg_destroy+0x72/0xc7
       [<c0467df0>] cgroup_diput+0x4a/0xb2
       [<c04ca473>] dentry_iput+0x93/0xb7
       [<c04ca4b3>] d_kill+0x1c/0x36
       [<c04cb5c5>] dput+0xf5/0xfe
       [<c04c6084>] do_rmdir+0x95/0xbe
       [<c04c60ec>] sys_rmdir+0x10/0x12
       [<c04027cc>] sysenter_do_call+0x12/0x32

-> #0 (blkio_list_lock){+.+...}:
       [<c0458117>] validate_chain+0x61c/0xb9c
       [<c0458dba>] __lock_acquire+0x723/0x789
       [<c0458eb0>] lock_acquire+0x90/0xa7
       [<c06929fd>] _raw_spin_lock+0x1e/0x4e
       [<c053a990>] blkiocg_weight_write+0x82/0x9e
       [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
       [<c04bd2f3>] vfs_write+0x8c/0x116
       [<c04bd7c6>] sys_write+0x3b/0x60
       [<c04027cc>] sysenter_do_call+0x12/0x32

other info that might help us debug this:

1 lock held by test_io_control/7357:
 #0:  (&(&blkcg->lock)->rlock){......}, at: [<c053a949>] blkiocg_weight_write+0x3b/0x9e
stack backtrace:
Pid: 7357, comm: test_io_control Not tainted 2.6.33-rc2 #1
Call Trace:
 [<c045754f>] print_circular_bug+0x91/0x9d
 [<c0458117>] validate_chain+0x61c/0xb9c
 [<c0458dba>] __lock_acquire+0x723/0x789
 [<c0458eb0>] lock_acquire+0x90/0xa7
 [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
 [<c06929fd>] _raw_spin_lock+0x1e/0x4e
 [<c053a990>] ? blkiocg_weight_write+0x82/0x9e
 [<c053a990>] blkiocg_weight_write+0x82/0x9e
 [<c0467f1e>] cgroup_file_write+0xc6/0x1c0
 [<c0454df5>] ? trace_hardirqs_off+0xb/0xd
 [<c044d93a>] ? cpu_clock+0x2e/0x44
 [<c050e6ec>] ? security_file_permission+0xf/0x11
 [<c04bcdda>] ? rw_verify_area+0x8a/0xad
 [<c0467e58>] ? cgroup_file_write+0x0/0x1c0
 [<c04bd2f3>] vfs_write+0x8c/0x116
 [<c04bd7c6>] sys_write+0x3b/0x60
 [<c04027cc>] sysenter_do_call+0x12/0x32

To prevent deadlock, we should take locks as following sequence:

blkio_list_lock -> queue_lock ->  blkcg_lock.

The following patch should fix this bug.

Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-01 09:58:54 +01:00
Stephen Rothwell
accee7854b block: include linux/err.h to use ERR_PTR
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-07 09:47:07 +01:00
Vivek Goyal
3e25206689 blkio: Implement dynamic io controlling policy registration
o One of the goals of block IO controller is that it should be able to
  support mulitple io control policies, some of which be operational at
  higher level in storage hierarchy.

o To begin with, we had one io controlling policy implemented by CFQ, and
  I hard coded the CFQ functions called by blkio. This created issues when
  CFQ is compiled as module.

o This patch implements a basic dynamic io controlling policy registration
  functionality in blkio. This is similar to elevator functionality where
  ioschedulers register the functions dynamically.

o Now in future, when more IO controlling policies are implemented, these
  can dynakically register with block IO controller.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:38:14 +01:00
Vivek Goyal
9d6a986c0b blkio: Export some symbols from blkio as its user CFQ can be a module
o blkio controller is inside the kernel and cfq makes use of interfaces
  exported by blkio. CFQ can be a module too, hence export symbols used
  by CFQ.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 16:38:14 +01:00
Jens Axboe
f2eecb9152 cfq-iosched: move IO controller declerations to a header file
They should not be declared inside some other file that's not related
to CFQ.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-04 10:06:35 +01:00
Vivek Goyal
f8d461d692 blkio: Propagate cgroup weight updation to cfq groups
o Propagate blkio cgroup weight updation to associated cfq groups.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:53 +01:00
Vivek Goyal
220841906f blkio: Export disk time and sectors used by a group to user space
o Export disk time and sector used by a group to user space through cgroup
  interface.

o Also export a "dequeue" interface to cgroup which keeps track of how many
  a times a group was deleted from service tree. Helps in debugging.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
2868ef7b39 blkio: Some debugging aids for CFQ
o Some debugging aids for CFQ.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
b1c3576961 blkio: Take care of cgroup deletion and cfq group reference counting
o One can choose to change elevator or delete a cgroup. Implement group
  reference counting so that both elevator exit and cgroup deletion can
  take place gracefully.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Nauman Rafique <nauman@google.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:52 +01:00
Vivek Goyal
31e4c28d95 blkio: Introduce blkio controller cgroup interface
o This is basic implementation of blkio controller cgroup interface. This is
  the common interface visible to user space and should be used by different
  IO control policies as we implement those.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-12-03 19:28:51 +01:00