kernel_optimize_test

Author	SHA1	Message	Date
David S. Miller	aa93466bdf	[TCP]: Eliminate redundant computations in tcp_write_xmit(). tcp_snd_test() is run for every packet output by a single call to tcp_write_xmit(), but this is not necessary. For one, the congestion window space needs to only be calculated one time, then used throughout the duration of the loop. This cleanup also makes experimenting with different TSO packetization schemes much easier. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:20:09 -07:00
David S. Miller	7f4dd0a943	[TCP]: Break out tcp_snd_test() into it's constituent parts. tcp_snd_test() does several different things, use inline functions to express this more clearly. 1) It initializes the TSO count of SKB, if necessary. 2) It performs the Nagle test. 3) It makes sure the congestion window is adhered to. 4) It makes sure SKB fits into the send window. This cleanup also sets things up so that things like the available packets in the congestion window does not need to be calculated multiple times by packet sending loops such as tcp_write_xmit(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:54 -07:00
David S. Miller	55c97f3e99	[TCP]: Fix __tcp_push_pending_frames() 'nonagle' handling. 'nonagle' should be passed to the tcp_snd_test() function as 'TCP_NAGLE_PUSH' if we are checking an SKB not at the tail of the write_queue. This is because Nagle does not apply to such frames since we cannot possibly tack more data onto them. However, while doing this __tcp_push_pending_frames() makes all of the packets in the write_queue use this modified 'nonagle' value. Fix the bug and simplify this function by just calling tcp_write_xmit() directly if sk_send_head is non-NULL. As a result, we can now make tcp_data_snd_check() just call tcp_push_pending_frames() instead of the specialized __tcp_data_snd_check(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:38 -07:00
David S. Miller	a2e2a59c93	[TCP]: Fix redundant calculations of tcp_current_mss() tcp_write_xmit() uses tcp_current_mss(), but some of it's callers, namely __tcp_push_pending_frames(), already has this value available already. While we're here, fix the "cur_mss" argument to be "unsigned int" instead of plain "unsigned". Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:23 -07:00
David S. Miller	92df7b518d	[TCP]: tcp_write_xmit() tabbing cleanup Put the main basic block of work at the top-level of tabbing, and mark the TCP_CLOSE test with unlikely(). Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:19:06 -07:00
David S. Miller	a762a98007	[TCP]: Kill extra cwnd validate in __tcp_push_pending_frames(). The tcp_cwnd_validate() function should only be invoked if we actually send some frames, yet __tcp_push_pending_frames() will always invoke it. tcp_write_xmit() does the call for us, so the call here can simply be removed. Also, tcp_write_xmit() can be marked static. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:51 -07:00
David S. Miller	f44b527177	[TCP]: Add missing skb_header_release() call to tcp_fragment(). When we add any new packet to the TCP socket write queue, we must call skb_header_release() on it in order for the TSO sharing checks in the drivers to work. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:34 -07:00
David S. Miller	84d3e7b957	[TCP]: Move __tcp_data_snd_check into tcp_output.c It reimplements portions of tcp_snd_check(), so it we move it to tcp_output.c we can consolidate it's logic much easier in a later change. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:18 -07:00
David S. Miller	f6302d1d78	[TCP]: Move send test logic out of net/tcp.h This just moves the code into tcp_output.c, no code logic changes are made by this patch. Using this as a baseline, we can begin to untangle the mess of comparisons for the Nagle test et al. We will also be able to reduce all of the redundant computation that occurs when outputting data packets. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:18:03 -07:00
David S. Miller	fc6415bcb0	[TCP]: Fix quick-ack decrementing with TSO. On each packet output, we call tcp_dec_quickack_mode() if the ACK flag is set. It drops tp->ack.quick until it hits zero, at which time we deflate the ATO value. When doing TSO, we are emitting multiple packets with ACK set, so we should decrement tp->ack.quick that many segments. Note that, unlike this case, tcp_enter_cwr() should not take the tcp_skb_pcount(skb) into consideration. That function, one time, readjusts tp->snd_cwnd and moves into TCP_CA_CWR state. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:45 -07:00
David S. Miller	c65f7f00c5	[TCP]: Simplify SKB data portion allocation with NETIF_F_SG. The ideal and most optimal layout for an SKB when doing scatter-gather is to put all the headers at skb->data, and all the user data in the page array. This makes SKB splitting and combining extremely simple, especially before a packet goes onto the wire the first time. So, when sk_stream_alloc_pskb() is given a zero size, make sure there is no skb_tailroom(). This is achieved by applying SKB_DATA_ALIGN() to the header length used here. Next, make select_size() in TCP output segmentation use a length of zero when NETIF_F_SG is true on the outgoing interface. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:17:25 -07:00
David Chau	52609c0b56	[NET]: improve readability of dev_set_promiscuity() in net/core/dev.c A trivial patch to improve the readability of dev_set_promiscuity() in net/core/dev.c. New code does exactly the same thing as original code. Signed-off-by: David Chau <ddcc@mit.edu> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:11:06 -07:00
Robert Olsson	2f36895aa7	[IPV4]: More broken memory allocation fixes for fib_trie Below a patch to preallocate memory when doing resize of trie (inflate halve) If preallocations fails it just skips the resize of this tnode for this time. The oops we got when killing bgpd (with full routing) is now gone. Patrick memory patch is also used. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:02:40 -07:00
Thomas Graf	db1322b801	[DECNET]: Fix memset overflow on 64bit archs while dumping decnet routing rules Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:01:25 -07:00
Eric Dumazet	bb1d23b026	[IPV4]: Bug fix in rt_check_expire() - rt_check_expire() fixes (an overflow occured if size of the hash was >= 65536) reminder of the bugfix: The rt_check_expire() has a serious problem on machines with large route caches, and a standard HZ value of 1000. With default values, ie ip_rt_gc_interval = 60*HZ = 60000 ; the loop count : for (t = ip_rt_gc_interval << rt_hash_log; t >= 0; overflows (t is a 31 bit value) as soon rt_hash_log is >= 16 (65536 slots in route cache hash table). In this case, rt_check_expire() does nothing at all Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 15:00:32 -07:00
Eric Dumazet	424c4b70cc	[IPV4]: Use the fancy alloc_large_system_hash() function for route hash table - rt hash table allocated using alloc_large_system_hash() function. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:58:19 -07:00
Eric Dumazet	22c047ccbc	[NET]: Hashed spinlocks in net/ipv4/route.c - Locking abstraction - Spinlocks moved out of rt hash table : Less memory (50%) used by rt hash table. it's a win even on UP. - Sizing of spinlocks table depends on NR_CPUS Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:55:24 -07:00
Patrick McHardy	f0e36f8cee	[IPV4]: Handle large allocations in fib_trie Inflating a node a couple of times makes it exceed the 128k kmalloc limit. Use __get_free_pages for allocations > PAGE_SIZE, as in fib_hash. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:44:55 -07:00
Herbert Xu	e2ed4052aa	[IPV6]: Makes IPv6 rcv registration happen last during initialisation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:41:20 -07:00
Herbert Xu	30e224d76f	[IPV4]: Fix crash in ip_rcv while booting related to netconsole Makes IPv4 ip_rcv registration happen last in af_inet. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:40:10 -07:00
Thomas Graf	023e09a767	[PKT_SCHED]: Report rate estimator configuration errors during qdisc allocation Current behaviour is to not report an error if a rate estimator is created together with a qdisc and the configuration of the rate estimator is bogus. This leads to unexpected behaviour because the user is not notified. New behaviour is to report the error and let the whole qdisc creation operation fail so the user is able to fix his mistake. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:15:53 -07:00
Thomas Graf	3d54b82fdf	[PKT_SCHED]: Cleanup qdisc creation and alignment macros Adds qdisc_alloc() to share code between qdisc_create() and qdisc_create_dflt(). Hides the qdisc alignment behind macros and makes use of them. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:15:09 -07:00
Thomas Graf	e176fe8954	[NET]: Remove unused security member in sk_buff Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:12:44 -07:00
Patrick McHardy	3154e540e3	[NET]: net/core/filter.c: make len cover the entire packet As suggested by Herbert Xu: Since we don't require anything to be in the linear packet range anymore make len cover the entire packet. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:10:40 -07:00
Patrick McHardy	0b05b2a49e	[NET]: Consolidate common code in net/core/filter.c Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:10:21 -07:00
Patrick McHardy	6935d46c2d	[NET]: Remove redundant code in net/core/filter.c skb_header_pointer handles linear and non-linear data, no need to handle linear data again. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-07-05 14:08:57 -07:00
Patrick McHardy	9666dae510	[NETFILTER]: Fix connection tracking bug in 2.6.12 In 2.6.12 we started dropping the conntrack reference when a packet leaves the IP layer. This broke connection tracking on a bridge, because bridge-netfilter defers calling some NF_IP_* hooks to the bridge layer for locally generated packets going out a bridge, where the conntrack reference is no longer available. This patch keeps the reference in this case as a temporary solution, long term we will remove the defered hook calling. No attempt is made to drop the reference in the bridge-code when it is no longer needed, tc actions could already have sent the packet anywhere. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 16:04:44 -07:00
Denis Vlasenko	ff593c592a	[NET]: Micro optimization in eth_header() Signed-off-by: Denis Vlasenko <vda@ilport.com.ua> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:49:06 -07:00
YOSHIFUJI Hideaki	7fe40f73d7	[IPV6]: remove more unused IPV6_AUTHHDR things. Remove two more unused IPV6_AUTHHDR option things, which I failed to remove them last time, plus, mark IPV6_AUTHHDR obsolete. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:46:24 -07:00
Neil Horman	fb3d89498d	[IPVS]: Close race conditions on ip_vs_conn_tab list modification In an smp system, it is possible for an connection timer to expire, calling ip_vs_conn_expire while the connection table is being flushed, before ct_write_lock_bh is acquired. Since the list iterator loop in ip_vs_con_flush releases and re-acquires the spinlock (even though it doesn't re-enable softirqs), it is possible for the expiration function to modify the connection list, while it is being traversed in ip_vs_conn_flush. The result is that the next pointer gets set to NULL, and subsequently dereferenced, resulting in an oops. Signed-off-by: Neil Horman <nhorman@redhat.com> Acked-by: JulianAnastasov Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:40:02 -07:00
Robert Olsson	f835e471b5	[IPV4]: Broken memory allocation in fib_trie This should help up the insertion... but the resize is more crucial. and complex and needs some thinking. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 15:00:39 -07:00
Vlad Yasevich	2f85a42964	[SCTP] Make init & delayed sack timeouts configurable by user. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:24:23 -07:00
Maxime Bizon	7a1af5d7bb	[IPV4]: ipconfig.c: fix dhcp timeout behaviour I think there is a small bug in ipconfig.c in case IPCONFIG_DHCP is set and dhcp is used. When a DHCPOFFER is received, ip address is kept until we get DHCPACK. If no ack is received, ic_dynamic() returns negatively, but leaves the offered ip address in ic_myaddr. This makes the main loop in ip_auto_config() break and uses the maybe incomplete configuration. Not sure if it's the best way to do, but the following trivial patch correct this. Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:21:12 -07:00
Dietmar Eggemann	2c2910a401	[IPV4]: Snmpv2 Mib IP counter ipInAddrErrors support I followed Thomas' proposal to see every martian destination as a case where the ipInAddrErrors counter has to be incremented. There are two advantages by doing so: (1) The relation between the ipInReceive counter and all the other ipInXXX counters is more accurate in the case the RTN_UNICAST code check fails and (2) it makes the code in ip_route_input_slow easier. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:06:23 -07:00
YOSHIFUJI Hideaki	ae9cda5d65	[IPV6]: Don't dump temporary addresses twice Each IPv6 Temporary Address (w/ CONFIG_IPV6_PRIVACY) is dumped twice to netlink. Because temporary addresses are listed in idev->addr_list, there's no need to dump idev->tempaddr separately. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 13:00:30 -07:00
Patrick McHardy	8a47077a0b	[NETLINK]: Missing padding fields in dumped structures Plug holes with padding fields and initialized them to zero. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:56:45 -07:00
Patrick McHardy	9ef1d4c7c7	[NETLINK]: Missing initializations in dumped data Mostly missing initialization of padding fields of 1 or 2 bytes length, two instances of uninitialized nlmsgerr->msg of 16 bytes length. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:55:30 -07:00
Patrick McHardy	b3563c4fbf	[NETLINK]: Clear padding in netlink messages Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:54:43 -07:00
Harald Welte	4095ebf1e6	[NETFILTER]: ipt_CLUSTERIP: fix ARP mangling This patch adds mangling of ARP requests (in addition to replies), since ARP caches are made from snooping both requests and replies. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:49:30 -07:00
David S. Miller	85c1937b26	[EBTABLES]: Fix thinkos in ebt_log.c When converting over the skb_header_pointer(), I converted parts of this module incorrectly. Kill the 'u' union in ebt_log() and all the bogus references to it. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-28 12:39:40 -07:00
pageexec	4da62fc70d	[IPVS]: Fix for overflows From: <pageexec@freemail.hu> $subject was fixed in 2.4 already, 2.6 needs it as well. The impact of the bugs is a kernel stack overflow and privilege escalation from CAP_NET_ADMIN via the IP_VS_SO_SET_STARTDAEMON/IP_VS_SO_GET_DAEMON ioctls. People running with 'root=all caps' (i.e., most users) are not really affected (there's nothing to escalate), but SELinux and similar users should take it seriously if they grant CAP_NET_ADMIN to other users. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 16:00:19 -07:00
David S. Miller	d470e3b483	[NETLINK]: Fix two socket hashing bugs. 1) netlink_release() should only decrement the hash entry count if the socket was actually hashed. This was causing hash->entries to underflow, which resulting in all kinds of troubles. On 64-bit systems, this would cause the following conditional to erroneously trigger: err = -ENOMEM; if (BITS_PER_LONG > 32 && unlikely(hash->entries >= UINT_MAX)) goto err; 2) netlink_autobind() needs to propagate the error return from netlink_insert(). Otherwise, callers will not see the error as they should and thus try to operate on a socket with a zero pid, which is very bad. However, it should not propagate -EBUSY. If two threads race to autobind the socket, that is fine. This is consistent with the autobind behavior in other protocols. So bug #1 above, combined with this one, resulted in hangs on netlink_sendmsg() calls to the rtnetlink socket. We'd try to do the user sendmsg() with the socket's pid set to zero, later we do a socket lookup using that pid (via the value we stashed away in NETLINK_CB(skb).pid), but that won't give us the user socket, it will give us the rtnetlink socket. So when we try to wake up the receive queue, we dive back into rtnetlink_rcv() which tries to recursively take the rtnetlink semaphore. Thanks to Jakub Jelink for providing backtraces. Also, thanks to Herbert Xu for supplying debugging patches to help track this down, and also finding a mistake in an earlier version of this fix. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:31:51 -07:00
Robert Olsson	64053beeb5	[PKTGEN]: Fix random packet sizes causing panic Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:27:10 -07:00
Adrian Bunk	60fe740320	[TCP]: Let TCP_CONG_ADVANCED default to n It doesn't seem to make much sense to let an "If unsure, say N." option default to y. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:21:15 -07:00
David S. Miller	6c3607676c	[IPV4]: Fix thinko in TCP_CONG_BIC default. Since it is tristate when we offer it as a choice, we should definte it also as tristate when forcing it as the default. Otherwise kconfig warns. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-26 15:20:20 -07:00
Linus Torvalds	2031d0f586	Merge Christoph's freeze cleanup patch	2005-06-25 17:16:53 -07:00
Christoph Lameter	3e1d1d28d9	[PATCH] Cleanup patch for process freezing 1. Establish a simple API for process freezing defined in linux/include/sched.h: frozen(process) Check for frozen process freezing(process) Check if a process is being frozen freeze(process) Tell a process to freeze (go to refrigerator) thaw_process(process) Restart process frozen_process(process) Process is frozen now 2. Remove all references to PF_FREEZE and PF_FROZEN from all kernel sources except sched.h 3. Fix numerous locations where try_to_freeze is manually done by a driver 4. Remove the argument that is no longer necessary from two function calls. 5. Some whitespace cleanup 6. Clear potential race in refrigerator (provides an open window of PF_FREEZE cleared before setting PF_FROZEN, recalc_sigpending does not check PF_FROZEN). This patch does not address the problem of freeze_processes() violating the rule that a task may only modify its own flags by setting PF_FREEZE. This is not clean in an SMP environment. freeze(process) is therefore not SMP safe! Signed-off-by: Christoph Lameter <christoph@lameter.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-25 17:10:13 -07:00
David S. Miller	c54d7e03c3	[SUNRPC]: Fix {s,}size_t printf format strings in xprt.c Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 19:57:07 -07:00
David S. Miller	a6484045fd	[TCP]: Do not present confusing congestion control options by default. Create TCP_CONG_ADVANCED option, akin to IP_ADVANCED_ROUTER, which when disabled will bypass all of the congestion control Kconfig options and leave the user with a safe default. That safe default is currently BIC-TCP with new Reno as a fallback. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 18:07:51 -07:00
David S. Miller	bb298ca3ce	[IPV4]: Move FIB lookup algorithm choice under IP_ADVANCED_ROUTING Most users need not be concerned with a complex choice of what FIB lookup algorithm to use. So give them the safe default of IP_FIB_HASH if IP_ADVANCED_ROUTING is disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 17:50:53 -07:00
David S. Miller	f7704347a7	[PKT_SCHED]: Make TEXTSEARCH* options only selected. Do not present these confusing new options to the user unless he picked some facility that makes use of it, such as NET_EMATCH_TEXT. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-24 17:39:03 -07:00
Linus Torvalds	59a49e3871	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-24 00:31:46 -07:00
Adrian Bunk	52c1da3953	[PATCH] make various thing static Another rollup of patches which give various symbols static scope Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:43 -07:00
NeilBrown	5ba266d632	[PATCH] knfsd: nfsd4: fix probe_callback rpc_create_client was modified recently to do its own (synchronous) NULL ping of the server. We'd rather do that on our own, asynchronously, so that we don't have to block the nfsd thread doing the probe, and so that setclientid handling (hence, client mounts) can proceed normally whether the callback is succesful or not. (We can still function fine without the callback channel--we just won't be able to give out delegations till it's verified to work.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-24 00:06:30 -07:00
David S. Miller	f2d368fa3e	[PKT_SCHED]: Make NET_EMATCH_TEXT select TEXTSEARCh Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 23:55:41 -07:00
Thomas Graf	d675c989ed	[PKT_SCHED]: Packet classification based on textsearch (ematch) Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:58 -07:00
Thomas Graf	3fc7e8a6d8	[NET]: skb_find_text() - Find a text pattern in skb data Finds a pattern in the skb data according to the specified textsearch configuration. Use textsearch_next() to retrieve subsequent occurrences of the pattern. Returns the offset to the first occurrence or UINT_MAX if no match was found. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 21:00:17 -07:00
Thomas Graf	677e90eda3	[NET]: Zerocopy sequential reading of skb data Implements sequential reading for both linear and non-linear skb data at zerocopy cost. The data is returned in chunks of arbitary length, therefore random access is not possible. Usage: from := 0 to := 128 state := undef data := undef len := undef consumed := 0 skb_prepare_seq_read(skb, from, to, &state) while (len = skb_seq_read(consumed, &data, &state)) != 0 do /* do something with 'data' of length 'len' / if abort then / abort read if we don't wait for * skb_seq_read() to return 0 / skb_abort_seq_read(&state) return endif / not necessary to consume all of 'len' */ consumed += len done Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:59:51 -07:00
Stephen Hemminger	5f8ef48d24	[TCP]: Allow choosing TCP congestion control via sockopt. Allow using setsockopt to set TCP congestion control to use on a per socket basis. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:37:36 -07:00
Stephen Hemminger	51b0bdedb8	[NET]: Separate two usages of netdev_max_backlog. Separate out the two uses of netdev_max_backlog. One controls the upper bound on packets processed per softirq, the new name for this is netdev_budget; the other controls the limit on packets queued via netif_rx. Increase the max_backlog default to account for faster processors. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:14:40 -07:00
Stephen Hemminger	31aa02c53c	[NET]: Eliminate netif_rx massive packet drops. Eliminate the throttling behaviour when the netif receive queue fills because it behaves badly when using high speed networks under load. The throttling cause multiple packet drops that cause TCP to go into slow start mode. The same effective patch has been part of BIC TCP and H-TCP as well as part of Web100. The existing code drops 100's of packets when the queue fills; this changes it to individual packet drop-tail. Signed-off-by: Stephen Hemmminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:12:48 -07:00
Stephen Hemminger	34008d8c63	[NET]: Remove obsolete netif_rx congestion sensing mechanism. Remove the congestion sensing mechanism from netif_rx, and always return either full or empty. Almost no driver checks the return value from netif_rx, and those that do only use it for debug messages. The original design of netif_rx was to do flow control based on the receive queue, but NAPI has supplanted this and no driver uses the feedback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:10:00 -07:00
Stephen Hemminger	c1ebcdb8c4	[NET]: Remove obsolete fastroute stats. Remove last vestiages of fastroute code that is no longer used. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 20:08:59 -07:00
John Heffner	0e57976b63	[TCP]: Add Scalable TCP congestion control module. This patch implements Tom Kelly's Scalable TCP congestion control algorithm for the modular framework. The algorithm has some nice scaling properties, and has been used a fair bit in research, though is known to have significant fairness issues, so it's not really suitable for general purpose use. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:29:07 -07:00
Baruch Even	a7868ea68d	[TCP]: Add H-TCP congestion control module. H-TCP is a congestion control algorithm developed at the Hamilton Institute, by Douglas Leith and Robert Shorten. It is extending the standard Reno algorithm with mode switching is thus a relatively simple modification. H-TCP is defined in a layered manner as it is still a research platform. The basic form includes the modification of beta according to the ratio of maxRTT to min RTT and the alpha=2factor(1-beta) relation, where factor is dependant on the time since last congestion. The other layers improve convergence by adding appropriate factors to alpha. The following patch implements the H-TCP algorithm in it's basic form. Signed-Off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:28:11 -07:00
Stephen Hemminger	b87d8561d8	[TCP]: Add TCP Vegas congestion control module. TCP Vegas code modified for the new TCP infrastructure. Vegas now uses microsecond resolution timestamps for better estimation of performance over higher speed links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:27:19 -07:00
Daniele Lacamera	835b3f0c0d	[TCP]: Add TCP Hybla congestion control module. TCP Hybla congestion avoidance. - "In heterogeneous networks, TCP connections that incorporate a terrestrial or satellite radio link are greatly disadvantaged with respect to entirely wired connections, because of their longer round trip times (RTTs). To cope with this problem, a new TCP proposal, the TCP Hybla, is presented and discussed in the paper[1]. It stems from an analytical evaluation of the congestion window dynamics in the TCP standard versions (Tahoe, Reno, NewReno), which suggests the necessary modifications to remove the performance dependence on RTT.[...]"[1] [1]: Carlo Caini, Rosario Firrincieli, "TCP Hybla: a TCP enhancement for heterogeneous networks", International Journal of Satellite Communications and Networking Volume 22, Issue 5 , Pages 547 - 566. September 2004. Signed-off-by: Daniele Lacamera (root at danielinux.net)net Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:26:34 -07:00
John Heffner	a628d29b56	[TCP]: Add High Speed TCP congestion control module. Sally Floyd's high speed TCP congestion control. This is useful for comparison and research. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:24:58 -07:00
Stephen Hemminger	8727076289	[TCP]: Add TCP Westwood congestion control module. This is the existing 2.6.12 Westwood code moved from tcp_input to the new congestion framework. A lot of the inline functions have been eliminated to try and make it clearer. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:24:09 -07:00
Stephen Hemminger	83803034f4	[TCP]: Add TCP BIC congestion control module. TCP BIC congestion control reworked to use the new congestion control infrastructure. This version is more up to date than the BIC code in 2.6.12; it incorporates enhancements from BICTCP 1.1, to handle low latency links. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:23:25 -07:00
Stephen Hemminger	056ede6cfa	[TCP]: Report congestion control algorithm in tcp_diag. Enhancement to the tcp_diag interface used by the iproute2 ss command to report the tcp congestion control being used by a socket. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:21:28 -07:00
Stephen Hemminger	7c99c909fa	[TCP]: Change tcp_diag to use the existing __RTA_PUT() macro. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:20:36 -07:00
Stephen Hemminger	317a76f9a4	[TCP]: Add pluggable congestion control algorithm infrastructure. Allow TCP to have multiple pluggable congestion control algorithms. Algorithms are defined by a set of operations and can be built in or modules. The legacy "new RENO" algorithm is used as a starting point and fallback. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-23 12:19:55 -07:00
Paulo Marques	543537bd92	[PATCH] create a kstrdup library function This patch creates a new kstrdup library function and changes the "local" implementations in several places to use this function. Most of the changes come from the sound and net subsystems. The sound part had already been acknowledged by Takashi Iwai and the net part by David S. Miller. I left UML alone for now because I would need more time to read the code carefully before making changes there. Signed-off-by: Paulo Marques <pmarques@grupopie.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-06-23 09:45:18 -07:00
Linus Torvalds	060de20e82	Merge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	2005-06-22 23:11:50 -07:00
Shaun Pereira	ebc3f64b86	[X25]: Fast select with no restriction on response This patch is a follow up to patch 1 regarding "Selective Sub Address matching with call user data". It allows use of the Fast-Select-Acceptance optional user facility for X.25. This patch just implements fast select with no restriction on response (NRR). What this means (according to ITU-T Recomendation 10/96 section 6.16) is that if in an incoming call packet, the relevant facility bits are set for fast-select-NRR, then the called DTE can issue a direct response to the incoming packet using a call-accepted packet that contains call-user-data. This patch allows such a response. The called DTE can also respond with a clear-request packet that contains call-user-data. However, this feature is currently not implemented by the patch. How is Fast Select Acceptance used? By default, the system does not allow fast select acceptance (as before). To enable a response to fast select acceptance, After a listen socket in created and bound as follows socket(AF_X25, SOCK_SEQPACKET, 0); bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr)); but before a listen system call is made, the following ioctl should be used. ioctl(call_soc,SIOCX25CALLACCPTAPPRV); Now the listen system call can be made listen(call_soc, 4); After this, an incoming-call packet will be accepted, but no call-accepted packet will be sent back until the following system call is made on the socket that accepts the call ioctl(vc_soc,SIOCX25SENDCALLACCPT); The network (or cisco xot router used for testing here) will allow the application server's call-user-data in the call-accepted packet, provided the call-request was made with Fast-select NRR. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:16:17 -07:00
Shaun Pereira	cb65d506c3	[X25]: Selective sub-address matching with call user data. From: Shaun Pereira <spereira@tusc.com.au> This is the first (independent of the second) patch of two that I am working on with x25 on linux (tested with xot on a cisco router). Details are as follows. Current state of module: A server using the current implementation (2.6.11.7) of the x25 module will accept a call request/ incoming call packet at the listening x.25 address, from all callers to that address, as long as NO call user data is present in the packet header. If the server needs to choose to accept a particular call request/ incoming call packet arriving at its listening x25 address, then the kernel has to allow a match of call user data present in the call request packet with its own. This is required when multiple servers listen at the same x25 address and device interface. The kernel currently matches ALL call user data, if present. Current Changes: This patch is a follow up to the patch submitted previously by Andrew Hendry, and allows the user to selectively control the number of octets of call user data in the call request packet, that the kernel will match. By default no call user data is matched, even if call user data is present. To allow call user data matching, a cudmatchlength > 0 has to be passed into the kernel after which the passed number of octets will be matched. Otherwise the kernel behavior is exactly as the original implementation. This patch also ensures that as is normally the case, no call user data will be present in the Call accepted / call connected packet sent back to the caller Future Changes on next patch: There are cases however when call user data may be present in the call accepted packet. According to the X.25 recommendation (ITU-T 10/96) section 5.2.3.2 call user data may be present in the call accepted packet provided the fast select facility is used. My next patch will include this fast select utility and the ability to send up to 128 octets call user data in the call accepted packet provided the fast select facility is used. I am currently testing this, again with xot on linux and cisco. Signed-off-by: Shaun Pereira <spereira@tusc.com.au> (With a fix from Alexey Dobriyan <adobriyan@gmail.com>) Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:15:01 -07:00
James Lamanna	68d3187200	[EBTABLES]: vfree() checking cleanups From: jlamanna@gmail.com ebtables.c vfree() checking cleanups. Signed-off by: James Lamanna <jlamanna@gmail.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:12:57 -07:00
Nishanth Aravamudan	285b3afefa	[ATALK] aarp: replace schedule_timeout() with msleep() From: Nishanth Aravamudan <nacc@us.ibm.com> Use msleep() instead of schedule_timeout() to guarantee the task delays as expected. The current code is not wrong, but it does not account for early return due to signals, so I think msleep() should be appropriate. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:11:44 -07:00
Chuck Short	7abaa27c1c	[IPV4]: Fix route.c gcc4 warnings Signed-off by: Chuck Short <zulcss@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:10:23 -07:00
Jeff Moyer	fbeec2e155	[NETPOLL]: allow multiple netpoll_clients to register against one interface This patch provides support for registering multiple netpoll clients to the same network device. Only one of these clients may register an rx_hook, however. In practice, this restriction has not been problematic. It is worth mentioning, though, that the current design can be easily extended to allow for the registration of multiple rx_hooks. The basic idea of the patch is that the rx_np pointer in the netpoll_info structure points to the struct netpoll that has rx_hook filled in. Aside from this one case, there is no need for a pointer from the struct net_device to an individual struct netpoll. A lock is introduced to protect the setting and clearing of the np_rx pointer. The pointer will only be cleared upon netpoll client module removal, and the lock should be uncontested. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:59 -07:00
Jeff Moyer	115c1d6e61	[NETPOLL]: Introduce a netpoll_info struct This patch introduces a netpoll_info structure, which the struct net_device will now point to instead of pointing to a struct netpoll. The reason for this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock should be maintained per net_device, not per netpoll; and 2) this is a first step in providing support for multiple netpoll clients to register against the same net_device. The struct netpoll is now pointed to by the netpoll_info structure. As such, the previous behaviour of the code is preserved. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 22:05:31 -07:00
Eric Dumazet	f31f5f0512	[NET]: dont use strlen() but the result from a prior sprintf() Small patch to save an unecessary call to strlen() : sprintf() gave us the length, just trust it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-06-22 14:32:51 -07:00
Chuck Lever	ae3884621b	[PATCH] RPC: kick off socket connect operations faster Make the socket transport kick the event queue to start socket connects immediately. This should improve responsiveness of applications that are sensitive to slow mount operations (like automounters). We are now also careful to cancel the connect worker before destroying the xprt. This eliminates a race where xprt_destroy can finish before the connect worker is even allowed to run. Test-plan: Destructive testing (unplugging the network temporarily). Connectathon with UDP and TCP. Hard-code impossibly small connect timeout. Version: Fri, 29 Apr 2005 15:32:01 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:32 -04:00
Chuck Lever	20e5ac828d	[PATCH] RPC: TCP reconnects are too slow When the network layer reports a connection close, the RPC task waiting to reconnect should be notified so it can retry immediately instead of waiting for the normal connection establishment timeout. This reverts a change made in 2.6.6 as part of adding client support for RPC over TCP socket idle timeouts. Test-plan: Destructive testing with NFS over TCP mounts. Version: Fri, 29 Apr 2005 15:31:46 -0400 Signed-off-by: Chuck Lever <cel@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:32 -04:00
Trond Myklebust	0f9dc2b168	[PATCH] RPC: Clean up socket autodisconnect Cancel autodisconnect requests inside xprt_transmit() in order to avoid races. Use more efficient del_singleshot_timer_sync() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:31 -04:00
Trond Myklebust	14b218a8e4	[PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flag For internal purposes, the rpc_clnt_sigmask() call is replaced by a call to rpc_task_sigmask(), which ensures that the current task sigmask respects both the client cl_intr flag and the per-task NOINTR flag. Problem noted by Jiaying Zhang. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:30 -04:00
Andreas Gruenbacher	9ba02638e4	[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port The NFS and NFSACL programs run on the same RPC transport. This patch adds support for this by converting svc_program into a chained list of programs (server-side). Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:22 -04:00
Andreas Gruenbacher	bd8100e7ed	[PATCH] RPC: Encode and decode arbitrary XDR arrays Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:20 -04:00
Trond Myklebust	7e06b53d79	[PATCH] RPC: fix accounting bug in the case of a truncated RPC message Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Olaf Kirch	e053d1ab62	[PATCH] RPC: Lazy RPC receive buffer allocation Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:19 -04:00
Andreas Gruenbacher	007e251f2b	[PATCH] RPC: Allow multiple RPC client programs to share the same transport Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Acked-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:18 -04:00
Andreas Gruenbacher	cdf477068e	[PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailable Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:17 -04:00
J. Bruce Fields	6a19275ada	[PATCH] RPC: [PATCH] improve rpcauthauth_create error returns Currently we return -ENOMEM for every single failure to create a new auth. This is actually accurate for auth_null and auth_unix, but for auth_gss it's a bit confusing. Allow rpcauth_create (and the ->create methods) to return errors. With this patch, the user may sometimes see an EINVAL instead. Whee. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:16 -04:00
J. Bruce Fields	438b6fdebf	[PATCH] RPC: Don't fall back from krb5p to krb5i We shouldn't be silently falling back from krb5p to krb5i. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:16 -04:00
Trond Myklebust	96651ab341	[PATCH] RPC: Shrink struct rpc_task by switching to wait_on_bit() Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:07 -04:00
Trond Myklebust	5ee0ed7d3a	[PATCH] RPC: Make rpc_create_client() probe server for RPC program+version support Ensure that we don't create an RPC client without checking that the server does indeed support the RPC program + version that we are trying to set up. This enables us to immediately return an error to "mount" if it turns out that the server is only supporting NFSv2, when we requested NFSv3 or NFSv4. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:04 -04:00
Trond Myklebust	5b616f5d59	[PATCH] RPC: Make rpc_create_client() destroy the transport on failure. This saves us a couple of lines of cleanup code for each call. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:03 -04:00
Trond Myklebust	334ccfd545	[PATCH] RPC: Ensure XDR iovec length is initialized correctly in call_header Fix up call_header() so that it calls xdr_adjust_iovec(). Fix calculation of the scratch buffer length in xdr_init_encode(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:02 -04:00
Trond Myklebust	d05fdb0cec	[PATCH] RPC: Fix a race with rpc_restart_call() If the task->tk_exit() wants to restart the RPC call after delaying then the current RPC code will clobber the timer by calling rpc_delete_timer() immediately after re-entering the loop in __rpc_execute(). Problem noticed by Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2005-06-22 16:07:01 -04:00

1 2 3 4 5 ...

354 Commits