kernel_optimize_test

History

Greg Banks 59a252ff8c knfsd: avoid overloading the CPU scheduler with enormous load averages Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>		2009-03-18 17:38:41 -04:00
..
9p	9p: fix endian issues [attempt 3]	2009-02-06 22:07:41 -08:00
802	net: fix tokenring license	2009-03-03 23:48:50 -08:00
8021q	vlan: Fix vlan-in-vlan crashes.	2009-03-04 23:46:25 -08:00
appletalk	appletalk: convert aarp to net_device_ops	2009-01-07 17:21:44 -08:00
atm
ax25	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	2008-12-28 12:49:40 -08:00
bluetooth	bluetooth: driver API update	2009-01-07 17:23:17 -08:00
bridge	bridge: Fix LRO crash with tun	2009-02-09 15:07:18 -08:00
can	can: fix slowpath issue in hrtimer callback function	2009-01-14 21:06:55 -08:00
core	vlan: Fix vlan-in-vlan crashes.	2009-03-04 23:46:25 -08:00
dcb	DCB: fix kfree(skb)	2009-01-04 17:29:21 -08:00
dccp	dccp ccid-3: Fix RFC reference	2009-01-11 00:17:22 -08:00
decnet
dsa	dsa: convert to net_device_ops (v2)	2009-01-06 16:45:26 -08:00
econet
ethernet
ipv4	tcp: Like icmp use register_pernet_subsys	2009-03-03 01:14:21 -08:00
ipv6	ipv6: Fix BUG when disabled ipv6 module is unloaded	2009-03-11 09:22:51 -07:00
ipx
irda	tty: Fix an ircomm warning and note another bug	2009-01-02 10:19:43 -08:00
iucv	s390: remove s390_root_dev_*()	2009-01-06 10:44:34 -08:00
key	af_key: initialize xfrm encap_oa	2009-01-25 20:49:14 -08:00
lapb
llc
mac80211	mac80211: restrict to AP in outgoing interface heuristic	2009-02-11 11:27:17 -05:00
netfilter	netfilter: xt_recent: fix proc-file addition/removal of IPv4 addresses	2009-02-24 14:53:12 +01:00
netlabel	netlabel: Update kernel configuration API	2008-12-31 12:54:11 -05:00
netlink	netlink: invert error code in netlink_set_err()	2009-03-03 23:37:30 -08:00
netrom	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	2008-12-28 12:49:40 -08:00
packet	net: packet socket packet_lookup_frame fix	2009-02-01 01:53:29 -08:00
phonet	Phonet: do not compute unused value	2009-02-10 17:14:50 -08:00
rfkill	net/rfkill/rfkill.c: fix unused rfkill_led_trigger() warning	2009-01-04 17:11:24 -08:00
rose	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	2008-12-28 12:49:40 -08:00
rxrpc	RxRPC: Fix a potential NULL dereference	2009-02-06 21:50:52 -08:00
sched	pkt_sched: act_police: Fix a rate estimator test.	2009-03-04 17:38:10 -08:00
sctp	SCTP: change sctp_ctl_sock_init() to try IPv4 if IPv6 fails	2009-03-04 03:20:26 -08:00
sunrpc	knfsd: avoid overloading the CPU scheduler with enormous load averages	2009-03-18 17:38:41 -04:00
tipc	net/tipc/bcast.h: use ARRAY_SIZE	2009-01-11 00:06:33 -08:00
unix	introduce new LSM hooks where vfsmount is available.	2008-12-31 18:07:37 -05:00
wanrouter
wimax	wimax: fix oops in wimax_dev_get_by_genl_info() when looking up non-wimax iface	2009-02-12 17:00:20 -08:00
wireless	cfg80211: test before subtraction on unsigned	2009-03-06 15:54:32 -05:00
x25
xfrm	xfrm: Fix xfrm_state_find() wrt. wildcard source address.	2009-03-13 14:22:40 -07:00
compat.c
Kconfig	net: Move config NET_NS to from net/Kconfig to init/Kconfig	2009-01-26 12:25:55 -08:00
Makefile	wimax: Makefile, Kconfig and docbook linkage for the stack	2009-01-07 10:00:17 -08:00
nonet.c
socket.c	[CVE-2009-0029] System call wrappers part 22	2009-01-14 14:15:27 +01:00
sysctl_net.c
TUNABLE