kernel_optimize_test

History

Hidetoshi Seto 0cf55e1ec0 sched, cputime: Introduce thread_group_times() This is a real fix for problem of utime/stime values decreasing described in the thread: http://lkml.org/lkml/2009/11/3/522 Now cputime is accounted in the following way: - {u,s}time in task_struct are increased every time when the thread is interrupted by a tick (timer interrupt). - When a thread exits, its {u,s}time are added to signal->{u,s}time, after adjusted by task_times(). - When all threads in a thread_group exits, accumulated {u,s}time (and also c{u,s}time) in signal struct are added to c{u,s}time in signal struct of the group's parent. So {u,s}time in task struct are "raw" tick count, while {u,s}time and c{u,s}time in signal struct are "adjusted" values. And accounted values are used by: - task_times(), to get cputime of a thread: This function returns adjusted values that originates from raw {u,s}time and scaled by sum_exec_runtime that accounted by CFS. - thread_group_cputime(), to get cputime of a thread group: This function returns sum of all {u,s}time of living threads in the group, plus {u,s}time in the signal struct that is sum of adjusted cputimes of all exited threads belonged to the group. The problem is the return value of thread_group_cputime(), because it is mixed sum of "raw" value and "adjusted" value: group's {u,s}time = foreach(thread){{u,s}time} + exited({u,s}time) This misbehavior can break {u,s}time monotonicity. Assume that if there is a thread that have raw values greater than adjusted values (e.g. interrupted by 1000Hz ticks 50 times but only runs 45ms) and if it exits, cputime will decrease (e.g. -5ms). To fix this, we could do: group's {u,s}time = foreach(t){task_times(t)} + exited({u,s}time) But task_times() contains hard divisions, so applying it for every thread should be avoided. This patch fixes the above problem in the following way: - Modify thread's exit (= __exit_signal()) not to use task_times(). It means {u,s}time in signal struct accumulates raw values instead of adjusted values. As the result it makes thread_group_cputime() to return pure sum of "raw" values. - Introduce a new function thread_group_times(task, utime, *stime) that converts "raw" values of thread_group_cputime() to "adjusted" values, in same calculation procedure as task_times(). - Modify group's exit (= wait_task_zombie()) to use this introduced thread_group_times(). It make c{u,s}time in signal struct to have adjusted values like before this patch. - Replace some thread_group_cputime() by thread_group_times(). This replacements are only applied where conveys the "adjusted" cputime to users, and where already uses task_times() near by it. (i.e. sys_times(), getrusage(), and /proc/<PID>/stat.) This patch have a positive side effect: - Before this patch, if a group contains many short-life threads (e.g. runs 0.9ms and not interrupted by ticks), the group's cputime could be invisible since thread's cputime was accumulated after adjusted: imagine adjustment function as adj(ticks, runtime), {adj(0, 0.9) + adj(0, 0.9) + ....} = {0 + 0 + ....} = 0. After this patch it will not happen because the adjustment is applied after accumulated. v2: - remove if()s, put new variables into signal_struct. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Spencer Candland <spencer@bluehost.com> Cc: Americo Wang <xiyou.wangcong@gmail.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4B162517.8040909@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>		2009-12-02 17:32:40 +01:00
..
9p
adfs
affs
afs
autofs
autofs4
befs
bfs
btrfs	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable	2009-11-11 13:38:59 -08:00
cachefiles
cifs	cifs: don't use CIFSGetSrvInodeNumber in is_path_accessible	2009-11-06 22:06:14 +00:00
coda
configfs
cramfs
debugfs
devpts
dlm
ecryptfs
efs
exofs
exportfs
ext2
ext3	ext3: Wait for proper transaction commit on fsync	2009-11-11 15:22:49 +01:00
ext4	ext4: partial revert to fix double brelse WARNING()	2009-11-08 15:45:44 -05:00
fat
freevxfs
fscache
fuse
gfs2
hfs
hfsplus
hostfs
hpfs
hppfs
hugetlbfs
isofs
jbd	JBD/JBD2: free j_wbuf if journal init fails.	2009-11-11 15:24:14 +01:00
jbd2	JBD/JBD2: free j_wbuf if journal init fails.	2009-11-11 15:24:14 +01:00
jffs2
jfs
lockd
minix
ncpfs
nfs
nfs_common
nfsd
nilfs2	nilfs2: fix missing cleanup of gc cache on error cases	2009-11-08 19:04:25 +09:00
nls
notify
ntfs
ocfs2
omfs
openpromfs
partitions
proc	sched, cputime: Introduce thread_group_times()	2009-12-02 17:32:40 +01:00
qnx4
quota
ramfs
reiserfs
romfs
smbfs
squashfs
sysfs
sysv
ubifs
udf
ufs
xfs
aio.c
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf_fdpic.c
binfmt_elf.c
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
binfmt_som.c
bio-integrity.c
bio.c
block_dev.c
buffer.c
char_dev.c
compat_binfmt_elf.c
compat_ioctl.c
compat.c
dcache.c
dcookies.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c
fcntl.c
fifo.c
file_table.c
file.c
filesystems.c
fs_struct.c
fs-writeback.c
generic_acl.c
inode.c
internal.h
ioctl.c
ioprio.c
Kconfig
Kconfig.binfmt
libfs.c
locks.c
Makefile
mbcache.c
mpage.c
namei.c
namespace.c
nfsctl.c
no-block.c
open.c
pipe.c
pnode.c
pnode.h
posix_acl.c
read_write.c
read_write.h
readdir.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
super.c
sync.c
timerfd.c
utimes.c
xattr_acl.c
xattr.c